Error loading the tokenizer in transformers==4.4.2

zhangyilun commented 3 years ago

I'm getting error when initializing the class object, specifically at tokenizer loading:

In [6]: correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-6-d34dd9c5fe99> in <module>
----> 1 correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    414             tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
    415             if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 416                 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    417             else:
    418                 if tokenizer_class_py is not None:

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1703
   1704         return cls._from_pretrained(
-> 1705             resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
   1706         )
   1707

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs)
   1774         # Instantiate tokenizer.
   1775         try:
-> 1776             tokenizer = cls(*init_inputs, **init_kwargs)
   1777         except OSError:
   1778             raise OSError(

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/t5/tokenization_t5_fast.py in __init__(self, vocab_file, tokenizer_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, **kwargs)
    134             extra_ids=extra_ids,
    135             additional_special_tokens=additional_special_tokens,
--> 136             **kwargs,
    137         )
    138

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
     85         if fast_tokenizer_file is not None and not from_slow:
     86             # We have a serialization from tokenizers which let us directly build the backend
---> 87             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
     88         elif slow_tokenizer is not None:
     89             # We need to convert a slow tokenizer to build the backend

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 1 column 329667

transformers==4.4.2.

The installation package didn't specify the transformers version that this library is using. What should be the correct version? Or is it version independent and it's something else?

oborchers commented 3 years ago

Having the same problem +1 @PrithivirajDamodaran version: 4.5.0dev0

PrithivirajDamodaran commented 3 years ago

We tried to reproduce this with a pre-installed transformers 4.4.2 with Python 3.7. Works just fine, check the screenshot below. Is there anything different in your environments ? Let me know

I recommend trying to install Gramformer in a brand new Python virtual Env or Conda Env.

Note: Auto* APIs from hugging face are the way forward hence won't recommend model specific APIs ( sorry I had to remove that mentioned as a workaround)

PrithivirajDamodaran / Gramformer

Error loading the tokenizer in transformers==4.4.2 #4