Open pritamdeka opened 2 years ago
Hi @pritamdeka,
Could you give a bit more context mentioned below:
Thank you!
Kind Regards, Nandan Thakur
Hi @NThakur20 Thanks for the reply.
I used the train_sbert_BM25_hardnegs.py file for training using the SCIFACT dataset. Made a few changes for the T5 model such as changed the following line
word_embedding_model = models.Transformer(model_name, max_seq_length=300)
to
word_embedding_model = models.T5.T5(model_name, max_seq_length=300)
Also the model I used is castorini/monot5-base-msmarco
The error is something like this:
Epoch: 0% 0/1 [00:00<?, ?it/s]
Iteration: 0% 0/1121 [00:00<?, ?it/s]
Epoch: 0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/beir/examples/retrieval/training/train_sbert_BM25_hardnegs.py", line 132, in <module>
use_amp=True)
File "/usr/local/lib/python3.7/dist-packages/beir/retrieval/train.py", line 148, in fit
callback=callback, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py", line 682, in fit
data = next(data_iterator)
File "/usr/local/lib/python3.7/dist-packages/sentence_transformers/datasets/NoDuplicatesDataLoader.py", line 41, in __iter__
yield self.collate_fn(batch) if self.collate_fn is not None else batch
File "/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py", line 534, in smart_batching_collate
tokenized = self.tokenize(texts[idx])
File "/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py", line 311, in tokenize
return self._first_module().tokenize(texts)
File "/usr/local/lib/python3.7/dist-packages/sentence_transformers/models/T5.py", line 55, in tokenize
return self.tokenizer.encode(self.task_identifier+text)
TypeError: can only concatenate str (not "list") to str
Have you found a solution for this? Is there any T5-based models currently implemented in BeIR?
Hi @NThakur20 I was wondering if we can train a T5 model as when I was loading a T5 model from HF there seems to be an error.