Closed Chloe-YibaiLiu closed 2 months ago
Thanks for your interest!
We reproduced the whole process and did not find this issue. \ Hopefully you can provide more specific error logs. \ Maybe you need to make sure the device is connected to the internet.
@Chloe-YibaiLiu I encountered the same error. It's likely that you are not connected to the internet or that HuggingFace is blocked in your region. If the connection cannot be easily established, the workaround is to manually download the tokenizer and config.json of XLMRoberta. Note that you do not need to download the big model weight as CaR seems to only need the tokenizer and config files.
Do the following steps in an environment that can connect to HuggingFace.
xlm-roberta-large
. Use the same version of transformers as CaR.
from transformers import XLMRobertaTokenizerFast
save_directory = "xlm-roberta-large"
tokenizer = XLMRobertaTokenizerFast.from_pretrained("xlm-roberta-large") tokenizer.save_pretrained(save_directory)
2. Download `config.json`:
cd xlm-roberta-large wget https://huggingface.co/FacebookAI/xlm-roberta-large/blob/main/config.json cd ..
3. Lastly, move the entire `xlm-roberta-large` folder to the root of your CaR project.
If you encounter any error when using the above script to download the tokenizer, it's probably because CaR's transformers version is quite low. You can do the following to upgrade CaR to be compatible with transformers=4.44.1:
I've got checkpoints from HuggingFace and put them under the correct folders, but errors showed that tokenizer for "xlm-roberta-large" couldn't be loaded. Any other essential model files needed here for this command?