facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.8k stars 4.71k forks source link

Multiple Languages #716

Open jdvala opened 5 years ago

jdvala commented 5 years ago

The alignment looks great but what about aligning them into a single vector space when we have multiple languages? The problem is after the alignment I want to perform classification using these embeddings but for multilingual classification I want the vectors from different languages to be in a single vector space. Is there a way to do this using fastText?

EdouardGrave commented 5 years ago

Hi @jdvala,

Thank you for your question!

All the aligned word vectors are aligned to English, and are thus in the same vector space. You could then concatenate the word vector files from different languages, and use this to initialize the supervised model (with the -pretrainedVectors command line option). However, this will probably not work well, as some words are shared by multiple languages. We will add this use case to our list of feature request.

Best, Edouard.

1049451037 commented 5 years ago

Hi @EdouardGrave , are you going to release the training code of aligning word vectors like MUSE?

1049451037 commented 5 years ago

Oh I find the code: https://github.com/facebookresearch/fastText/tree/master/alignment/

kinoute commented 4 years ago

@1049451037 I'm late but did you get good results by aligning multiple languages with the link you provided? (unsup_align.py)