alexandrainst / danlp

DaNLP is a repository for Natural Language Processing resources for the Danish Language.
BSD 3-Clause "New" or "Revised" License
195 stars 33 forks source link

Value error on predict with xmlr_ned_model #155

Closed fnielsen closed 2 years ago

fnielsen commented 2 years ago

Describe the bug xlmr.predict generates a value error.

To Reproduce

>>> from danlp.models import load_xlmr_ned_model
>>> xlmr = load_xlmr_ned_model()
>>> sentence = "Karen Blixen vendte tilbage til Danmark, hvor hun boede resten af sit liv på Rungstedlund, som hun arvede efter sin mor i 1939"
>>> kg_context = "udmærkelser modtaget Kritikerprisen udmærkelser modtaget Tagea Brandts Rejselegat udmærkelser modtaget Ingenio ..."
>>> label = xlmr.predict(sentence, kg_context)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<dir>/lib/python3.8/site-packages/danlp/models/xlmr_models.py", line 126, in predict
    pred = self._get_pred(sentence, kg_context)
  File "<dir>/lib/python3.8/site-packages/danlp/models/xlmr_models.py", line 108, in _get_pred
    input1 = self.tokenizer.encode_plus(sentence, kg_context, add_special_tokens=True, return_tensors='pt',
  File "<dir>/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2494, in encode_plus
    return self._encode_plus(
  File "<dir>/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 635, in _encode_plus
    return self.prepare_for_model(
  File "<dir>/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2902, in prepare_for_model
    raise ValueError(
ValueError: Not possible to return overflowing tokens for pair of sequences with the `longest_first`. Please select another truncation strategy than `longest_first`, for instance `only_second` or `only_first`.

Screenshots

Your Environment