PrithivirajDamodaran / Gramformer

A framework for detecting, highlighting and correcting grammatical errors on natural language text. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
MIT License
1.5k stars 175 forks source link

Error loading the tokenizer in transformers==4.4.2 #4

Closed zhangyilun closed 3 years ago

zhangyilun commented 3 years ago

I'm getting error when initializing the class object, specifically at tokenizer loading:

In [6]: correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-6-d34dd9c5fe99> in <module>
----> 1 correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    414             tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
    415             if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 416                 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    417             else:
    418                 if tokenizer_class_py is not None:

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1703
   1704         return cls._from_pretrained(
-> 1705             resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
   1706         )
   1707

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs)
   1774         # Instantiate tokenizer.
   1775         try:
-> 1776             tokenizer = cls(*init_inputs, **init_kwargs)
   1777         except OSError:
   1778             raise OSError(

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/t5/tokenization_t5_fast.py in __init__(self, vocab_file, tokenizer_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, **kwargs)
    134             extra_ids=extra_ids,
    135             additional_special_tokens=additional_special_tokens,
--> 136             **kwargs,
    137         )
    138

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
     85         if fast_tokenizer_file is not None and not from_slow:
     86             # We have a serialization from tokenizers which let us directly build the backend
---> 87             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
     88         elif slow_tokenizer is not None:
     89             # We need to convert a slow tokenizer to build the backend

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 1 column 329667

transformers==4.4.2.

The installation package didn't specify the transformers version that this library is using. What should be the correct version? Or is it version independent and it's something else?

oborchers commented 3 years ago

Having the same problem +1 @PrithivirajDamodaran version: 4.5.0dev0

PrithivirajDamodaran commented 3 years ago

We tried to reproduce this with a pre-installed transformers 4.4.2 with Python 3.7. Works just fine, check the screenshot below. Is there anything different in your environments ? Let me know

image

I recommend trying to install Gramformer in a brand new Python virtual Env or Conda Env.

Note: Auto* APIs from hugging face are the way forward hence won't recommend model specific APIs ( sorry I had to remove that mentioned as a workaround)