[Bug]: transformers 4.40.0 assumes infinite sequence length on many models and breaks

helpmefindaname commented 5 months ago

Describe the bug

This is due to a regression on the transformers side, see: https://github.com/huggingface/transformers/issues/30643 for details.

Flair uses the tokenizer.model_max_length in the TransformerEmbeddings to truncate (if allow_long_sentences=False) or split (if allow_long_sentences=True) long sentences.

To Reproduce

from flair.data import Sentence
from flair.embeddings import TransformerWordEmbeddings

emb = TransformerWordEmbeddings("distilbert-base-cased", allow_long_sentences=True)
emb.embed(Sentence("Hallo World "*1024))

Expected behavior

The code should run through without any issue.

Logs and Stack traces

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\base.py", line 50, in embed
    self._add_embeddings_internal(data_points)
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\transformer.py", line 705, in _add_embeddings_internal
    embeddings = self._forward_tensors(tensors)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\transformer.py", line 1424, in _forward_tensors
    return self.forward(**tensors)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\transformer.py", line 1324, in forward
    hidden_states = self.model(input_ids, **model_kwargs)[-1]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\transformers\models\distilbert\modeling_distilbert.py", line 806, in forward
    embeddings = self.embeddings(input_ids, inputs_embeds)  # (bs, seq_length, dim)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\transformers\models\distilbert\modeling_distilbert.py", line 144, in forward
    embeddings = input_embeds + position_embeddings  # (bs, max_seq_length, dim)
                 ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (3074) must match the size of tensor b (512) at non-singleton dimension 1

Screenshots

No response

Additional Context

This bug is on the side of https://github.com/huggingface/transformers/issues/30643 therefore this issue is only for visiblity.

If you run into this problem, you can hotfix it in 2 ways:

pin transformers<4.40.0
provide the model_max_length parameter yourself, e.g. emb = TransformerWordEmbeddings("distilbert-base-cased", allow_long_sentences=True, model_max_length=512)

Environment

Versions:

Flair

0.13.1

Pytorch

2.3.0+cpu

Transformers

4.40.0

GPU

False

helpmefindaname commented 5 months ago

Combined with https://github.com/flairNLP/flair/issues/3441 we currently recommend installing flair via:

pip install flair "transformers<4.40.0" "scipy<1.13.0" until the respective issues are resolved.

helpmefindaname commented 4 months ago

They fixed it on their side

flairNLP / flair