Closed theanhle closed 6 years ago
Hi,
1) For now, Infersent is an English-only sentence embeddings method, since it is based on supervised data only available in English (=SNLI). If you're interested in having multilingual sentence embeddings (or monolingual but in another language), you might want to consider the bag-of-words baseline that would simply average the word embeddings in a sentence, using the embeddings provided here: https://github.com/facebookresearch/MUSE.
2) if you want to train an InferSet model with size 256, modify the train_nli.py script and set enc_lstm_dim to 128. Though, you might want to encode all your sentences with InferSent-4096, and then perform a PCA and keep only the first 256 principal components.