Open karynaur opened 8 months ago
@karynaur Our model cannot be loaded in this manner. Please refer to predict.py for specific loading instructions. In Hugging Face, only the parameters (.bin file) are saved, while the tokenizer and config need to be loaded using from_pretrained('xlm-roberta-base').
Downloading the model and running the predict.py file gives the same error @cwszz
Traceback (most recent call last):
File "predict.py", line 115, in <module>
model = torch.load("model/pytorch_model.bin")
File "/home/adityas/.local/lib/python3.8/site-packages/torch/serialization.py", line 1026, in load
return _load(opened_zipfile,
File "/home/adityas/.local/lib/python3.8/site-packages/torch/serialization.py", line 1438, in _load
result = unpickler.load()
File "/home/adityas/.local/lib/python3.8/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta.py", line 198, in __setstate__
self.sp_model.LoadFromSerializedProto(self.sp_model_proto)
AttributeError: 'XLMRobertaTokenizer' object has no attribute 'sp_model_proto'```
@karynaur It seems you're not using our predict.py, where there isn't a line 115. Also, please check if your tokenizer is directly from_pretrained(xlmr-base).
Thanks for pointing that out @cwszz. I did make a few modifications to the file. But even on running the predict.py on a new colab environment, it was giving me the same error.
Loading tsv from /content/drive/MyDrive/Honours Project/code/XPR/data/sentences/en-ro-phrase-sentences.32.tsv ...
Loading tsv from /content/drive/MyDrive/Honours Project/code/XPR/data/sentences/ro-phrase-sentences.32.tsv ...
[!] collect 67 samples
没找到共0
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Traceback (most recent call last):
File "/content/XPR/predict.py", line 97, in <module>
model = torch.load(args.load_model_path)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1014, in load
return _load(opened_zipfile,
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1422, in _load
result = unpickler.load()
File "/usr/local/lib/python3.10/dist-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta.py", line 198, in __setstate__
self.sp_model.LoadFromSerializedProto(self.sp_model_proto)
AttributeError: 'XLMRobertaTokenizer' object has no attribute 'sp_model_proto'```
@karynaur It seems that the tokenizer is being loaded during the load process. Currently, my speculation is that it might be due to a transformer version issue. Could you please check why the init method of the tokenizer is being invoked during the load process?
Can you let me know which transformer and torch version was used while the model was trained? Ill try to downgrade the transformers version and recheck it @cwszz
@karynaur You can try version 4.17.0 first. Additionally, the problem lies with the loading of the tokenizer. If you could investigate this issue, changing the version might not be necessary.
Gotcha. I'll update after I find a fix and if It's helpful ill send a PR
Hey, I am unable to load the model from the huggingface checkpoint. Here is the code and the error:
The error im getting:
@cwszz can you help me with this?