JohnGiorgi / DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!
https://aclanthology.org/2021.acl-long.72/
Apache License 2.0
378 stars 33 forks source link

Strange issue occuring during Training #238

Closed kingafy closed 2 years ago

kingafy commented 3 years ago

Team, I have been using Declutr for last few months and have trained various models. A strange issue is surfacing. currently, When I convert the model to hugging face from using the code save_pretrained.py code there are 7 files getting created where as earlier it were 6 files. The extra file is of tokenizer.json file. When I consume this model , I am not able to load in Autotokenizer which gives me strange issues. Any idea why additional files are getting generated .IS it due to any environmental glitch or training data issue?

JohnGiorgi commented 3 years ago

Hi @kingafy,

It would be helpful if you could copy/paste the errors you get when trying to call AutoTokenizer.from_pretrained.

Ultimately, save_pretrained_hf.py is just calling the save_pretrained functions from the transformers library, so that would be a good place to look if it is behaving unexpectedly.

JohnGiorgi commented 2 years ago

Closing this, please feel free to re-open if you are still having trouble.