facebookresearch / XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.
Other
2.89k stars 497 forks source link

XLM-R Sentence Piece Vocabulary #253

Closed ruitedk6 closed 4 years ago

ruitedk6 commented 4 years ago

Thank you for providing us with the new XLM-R models! Would it be possible to also get access to the corresponding Sentence Piece vocabulary, so as to map the indices that are output from xlmr.encode() to the actual (readable) subword tokens?

aconneau commented 4 years ago

this one for the counts: https://dl.fbaipublicfiles.com/aconneau/xlm/xlmr.vocab

and

this one for the ids: https://dl.fbaipublicfiles.com/aconneau/xlm/xlmr.vocab.spm

ruitedk6 commented 4 years ago

Thank you for sharing the files with us, however, the files posted do not seem to be accessible at this point, as I get the following error for both:

HTTP request sent, awaiting response... 403 Forbidden

bratao commented 4 years ago

@aconneau please can you put the files online again?

WDZEthan commented 4 years ago

@aconneau please can you put the files online again?

ZJaume commented 2 years ago

https://dl.fbaipublicfiles.com/xlm/xlmr.vocab.spm works