Hi, I wonder if bert multilingual representations can perform like other multilingual embeddings obtained by aligning monolingual embeddings (like fastText multilingual )? That is to say, do the synonyms in a parallel sentence in different languages have analogous vector representations? Is the pretrained bert multilingual model cross-lingual, or just a multilingual model that can receive tokens in different languages as input?
I read the Multilingual README, and didn't find any clue about cross-lingual setting. Does the pretrained model just use corpora in different languages, which means the representations in final layer are not promised to be cross-lingual?
Hi @wanicca,
Can't answer your question but if you'd like to make a comparison I'd suggest you to have a look at this repo too https://github.com/facebookresearch/XLM for a cross-lingual version of bert.
Hi, I wonder if bert multilingual representations can perform like other multilingual embeddings obtained by aligning monolingual embeddings (like fastText multilingual )? That is to say, do the synonyms in a parallel sentence in different languages have analogous vector representations? Is the pretrained bert multilingual model cross-lingual, or just a multilingual model that can receive tokens in different languages as input?
I read the Multilingual README, and didn't find any clue about cross-lingual setting. Does the pretrained model just use corpora in different languages, which means the representations in final layer are not promised to be cross-lingual?