facebookresearch / InferSent

InferSent sentence embeddings
Other
2.28k stars 470 forks source link

About training InferSent on another language #44

Closed theanhle closed 6 years ago

theanhle commented 6 years ago
aconneau commented 6 years ago

Hi,

1) For now, Infersent is an English-only sentence embeddings method, since it is based on supervised data only available in English (=SNLI). If you're interested in having multilingual sentence embeddings (or monolingual but in another language), you might want to consider the bag-of-words baseline that would simply average the word embeddings in a sentence, using the embeddings provided here: https://github.com/facebookresearch/MUSE.

2) if you want to train an InferSet model with size 256, modify the train_nli.py script and set enc_lstm_dim to 128. Though, you might want to encode all your sentences with InferSent-4096, and then perform a PCA and keep only the first 256 principal components.