Closed loretoparisi closed 5 years ago
Training a cross-lingual language model goes beyond the purpose of this repo, which is only to provide code to learn BPE efficiently, whether it is monolingual or cross-lingual, for language modeling / MT / sentence classification, etc. Maybe have a look at https://github.com/facebookresearch/XLM it provides code / commands to train cross-lingual language models, using fastBPE.
@glample thank you, that's a starting point.
It would be worth to provide a tutorial about training a cross-lingual model (classification, etc.) using FastText with BPE preprocessing. It's not exactly clear to me how this would work in practice for a given an input training set and a BPE model i.e. let's say
93langs.fcodes
and93langs.fvocab
files. (these are the ones provided by Facebook's LASER bi-LSTM model). In my case I would like to use the BPE in combination with a simpler fastText supervised classifier model.