UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.19k stars 2.47k forks source link

TypeError: __call__() got multiple values for argument 'padding' #1044

Open ajinkya2903 opened 3 years ago

ajinkya2903 commented 3 years ago

@nreimers While training of cross encoder I am getting this error. Training is completed and while evaluation starts this error pops up. What is the solution to this?

ddofer commented 2 years ago

+1 Same issue/error, when calling the evaluator, on a dataset with a single text sentence.

cemodel = CrossEncoder(my_pretrained_sentenceTransformerModel_path, num_labels=1, device="cuda")

evaluator = CESoftmaxAccuracyEvaluator(sentence_pairs=[x[0] for x in X_test],labels = y_test,write_csv=True)

cemodel.fit(train_dataloader=train_dataloader,
          evaluator=evaluator,
          epochs=num_epochs,
          warmup_steps=warmup_steps,
          output_path=model_save_path,
            show_progress_bar=True,
          use_amp=True,)
TypeError                                 Traceback (most recent call last)
<ipython-input-42-993535beba66> in <module>()
----> 1 evaluator(cemodel)

5 frames
/usr/local/lib/python3.7/dist-packages/sentence_transformers/cross_encoder/CrossEncoder.py in smart_batching_collate_text_only(self, batch)
     93                 texts[idx].append(text.strip())
     94 
---> 95         tokenized = self.tokenizer(*texts, padding=True, truncation='longest_first', return_tensors="pt", max_length=self.max_length)
     96 
     97         for name in tokenized:

TypeError: __call__() got multiple values for argument 'padding'
ddofer commented 2 years ago

EDIT: I can confirm that the same issue happens when loading a "Default" sentneceTransformer model;

cemodel = CrossEncoder("sentence-transformers/all-MiniLM-L6-v2", num_labels=1, device="cuda")

nreimers commented 2 years ago

sentence_pairs must be a list of lists with the sentences, i.e. the following should fix it:

evaluator = CESoftmaxAccuracyEvaluator(sentence_pairs=[ [x[0]] for x in X_test],labels = y_test,write_csv=True)