UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.83k stars 2.44k forks source link

Use of T5 not possible with Transformer object #870

Closed JosephGatto closed 3 years ago

JosephGatto commented 3 years ago

When I run this piece of code

from transformers import T5Tokenizer
from sentence_transformers.models import Transformer, Pooling
from sentence_transformers import SentenceTransformer
tokenizer = T5Tokenizer.from_pretrained('t5-small')
input_ids = tokenizer('I Love Sentence Transformers', return_tensors='pt', return_attention_mask=True)
word_embedding_model = Transformer('t5-small', max_seq_length=256).to('cuda:0') ###
pooling_model = Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
model(input_ids.to('cuda:0'))

This does not produce any output but gives the following error ValueError: You have to specify either decoder_inputs or decoder_inputs_embeds

However, if I run this code

from transformers import T5Tokenizer
from sentence_transformers.models import T5, Pooling
from sentence_transformers import SentenceTransformer
tokenizer = T5Tokenizer.from_pretrained('t5-small')
input_ids = tokenizer('I Love Sentence Transformers', return_tensors='pt', return_attention_mask=True)
word_embedding_model = T5.T5('t5-small', max_seq_length=256).to('cuda:0') ### 
pooling_model = Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
model(input_ids.to('cuda:0'))

I get an output as expected.

I have tried to modify the source code to allow for the forward function to accommodate T5 but am getting stuck.

1) Is there a way around this so I can use the Transformer module. 2) Is there anything wrong with using the depreciated T5 class?

Thanks!

JosephGatto commented 3 years ago

Found a hacky solution - replace the SentenceTransformer.auto_model with just the encoder of the T5 model

nreimers commented 3 years ago

Hi, I did not get good results with T5, hence, I did not put much of a focus on integrating it.

If the depreciated T5 class works, then there is nothing wrong with using it. But I don't maintain it actively.

If you get good results with T5, let me know.

JosephGatto commented 3 years ago

Hi, I did not get good results with T5, hence, I did not put much of a focus on integrating it.

If the depreciated T5 class works, then there is nothing wrong with using it. But I don't maintain it actively.

If you get good results with T5, let me know.

So I ran some experiments using this example where I simply swapped out distil-bert for the encoder of two different T5 models. https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark.py

The only difference is I lowered the batch size to 2 to accommodate the use of larger T5 model on my machine.

Model | Spearman t5-small | 0.728 t5-large | 0.827

The performance likely gets better with the 3b and 11b parameter models (which I cannot load into memory sadly). There is clear evidence in their paper however that basically every task seems to noticeably benefit from more parameters.

Hope this helps!

pritamdeka commented 2 years ago

@JosephGatto Hi. Could you provide the code for the change that you did where you swapped out the distilbert model for the encoder of the T5 model?