Errors when trying to replicate Natural Language Inference (NLI) task

l4b4r4b4b4 commented 4 months ago

hi, I am trying to replicate the Natural Language Inference (NLI) task example to train a cross-encoder on a 15 labels dataset.

When using the distilroberta-base I get the following error on training start:

model.fit(
  File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 278, in fit
    model_predictions = self.model(**features, return_dict=True)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py", line 1195, in forward
    outputs = self.roberta(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py", line 798, in forward
    buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (576) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [2, 576].  Tensor sizes: [1, 514]

When using models from bge-reranker-v2 family, the training starts successfully, but I get the following error on evaluation finish:

Epoch:   0% 0/5 [00:00<?, ?it/s]
CESoftmaxAccuracyEvaluator: Evaluating the model on triage-v0.1-training dataset in epoch 0 after 10 steps:
Accuracy: 7.99
CEF1Evaluator: Evaluating the model on triage-v0.1-training dataset in epoch 0 after 10 steps:

- Macro F1 score      : 0.99
- Micro F1 score      : 7.99
- Weighted F1 score   : 14.80
Iteration:   0% 9/36133 [00:55<62:22:11,  6.22s/it]
Epoch:   0% 0/5 [00:55<?, ?it/s]

Traceback (most recent call last):
...
    training_stats = model.fit(
  File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 295, in fit
    self._eval_during_training(
  File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 495, in _eval_during_training
    if score > self.best_score:
TypeError: '>' not supported between instances of 'dict' and 'int'

This one seems to be because of the evaluator returning three different scores in the example resulting in a dictionary with three fields on score rather than an int. Looking into the code it seems this needs primary_metric to be set on the evluator, which it should actually do when looking into the code, but somehow it does not...

I will do a hot fix in a custom CrossEncoder comparing to the SequentialScore defined in the NLI example as follows: cross_encoder.py


...
if score.sequential_score > self.best_score:
  self.best_score = score.sequential_score
...
....

tomaarsen commented 4 months ago

Hello!

Apologies for the delay, I was on vacation last week. This is indeed a new bug introduced in the v3 release when training CrossEncoder models with the SequentialEvaluator. This class will now return a dictionary of results, whereas the CrossEncoder fit method still expects a singular score. I think I might not fix this immediately, as I will indeed (as @ir2718 pointed out) refactor the CrossEncoder training soon, after which all of the CE evaluators will also return dictionaries.

Tom Aarsen

yuto3o commented 4 weeks ago

Has this bug been fixed? I still face the same problem in 3.1.1

UKPLab / sentence-transformers

Errors when trying to replicate Natural Language Inference (NLI) task #2737