UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.14k stars 2.46k forks source link

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 1 column 317718 #1069

Open dongteng opened 3 years ago

dongteng commented 3 years ago

need help.thanks!!! model = SentenceTransformer('/dataset/BERT/paraphrase-multilingual-mpnet-base-v2') I've downloaded the pretrained model because I want to use it offline.Then the error

nreimers commented 3 years ago

Please update transformers and the tokenizer python packages.

dongteng commented 3 years ago

I've solved it by updating the tokenizers to the latest

dongteng commented 3 years ago

Please update transformers and the tokenizer python packages.

Thanks!!

buhrmann commented 3 years ago

Hi, just to add that this seems to be a missing dependency in sentence-transformers. Perhaps you want to add tokenizers>=0.10.3 explicitly if that is the minimal version you need for (all) your models to work. You only have the transformers dependency at the moment, but transformers itself is less strict (tokenizers>=0.10.1,<0.11).

nreimers commented 3 years ago

@buhrmann Thanks for pointing this out, updated the requirements

gonzoramos commented 3 years ago

Can someone advise me how to I update the tokenizer to the latest? I am using miniforge on an m1 mac and I cannot make progress pass this error. I have tried pip install tokenizers=0.10.3 after doing conda install sentence-transformers.

Many thanks in advance.

nreimers commented 3 years ago

try pip install -U tokenizers

It is also präferable to install sentence-transformers via pip, not with conda: Conda people are sadly unreliable to update the versions. Pull requests are often ignored for quite a while.

gonzoramos commented 3 years ago

thanks - unfortunately, that does not work. If I try to install using sentence-transformers using the pip route, it fails hard - because of the m1 silicon. scikit-learn will just not build. I have decided not to ML on my mac for now and use my other windows / intel HW.