Is multilingual model cross-lingual?

wanicca commented 5 years ago

Hi, I wonder if bert multilingual representations can perform like other multilingual embeddings obtained by aligning monolingual embeddings (like fastText multilingual )? That is to say, do the synonyms in a parallel sentence in different languages have analogous vector representations? Is the pretrained bert multilingual model cross-lingual, or just a multilingual model that can receive tokens in different languages as input?

I read the Multilingual README, and didn't find any clue about cross-lingual setting. Does the pretrained model just use corpora in different languages, which means the representations in final layer are not promised to be cross-lingual?

shoegazerstella commented 5 years ago

Hi @wanicca, Can't answer your question but if you'd like to make a comparison I'd suggest you to have a look at this repo too https://github.com/facebookresearch/XLM for a cross-lingual version of bert.

wanicca commented 5 years ago

XLM may be a better solution for cross-lingual tasks.

google-research / bert

Is multilingual model cross-lingual? #633