Closed JosephGatto closed 3 years ago
Found a hacky solution - replace the SentenceTransformer.auto_model with just the encoder of the T5 model
Hi, I did not get good results with T5, hence, I did not put much of a focus on integrating it.
If the depreciated T5 class works, then there is nothing wrong with using it. But I don't maintain it actively.
If you get good results with T5, let me know.
Hi, I did not get good results with T5, hence, I did not put much of a focus on integrating it.
If the depreciated T5 class works, then there is nothing wrong with using it. But I don't maintain it actively.
If you get good results with T5, let me know.
So I ran some experiments using this example where I simply swapped out distil-bert for the encoder of two different T5 models. https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark.py
The only difference is I lowered the batch size to 2 to accommodate the use of larger T5 model on my machine.
Model | Spearman t5-small | 0.728 t5-large | 0.827
The performance likely gets better with the 3b and 11b parameter models (which I cannot load into memory sadly). There is clear evidence in their paper however that basically every task seems to noticeably benefit from more parameters.
Hope this helps!
@JosephGatto Hi. Could you provide the code for the change that you did where you swapped out the distilbert model for the encoder of the T5 model?
When I run this piece of code
This does not produce any output but gives the following error
ValueError: You have to specify either decoder_inputs or decoder_inputs_embeds
However, if I run this code
I get an output as expected.
I have tried to modify the source code to allow for the forward function to accommodate T5 but am getting stuck.
1) Is there a way around this so I can use the Transformer module. 2) Is there anything wrong with using the depreciated T5 class?
Thanks!