facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 544 forks source link

Reproducing results for Tables 2 and 3 (English-Italian word and sentence translation retrieval.) #111

Open KamenBrestnichki opened 5 years ago

KamenBrestnichki commented 5 years ago

Hi!

I'm trying to reporduce the following 1 row of Table 2 in the MUSE paper.

                 English to Italian | Italian to English
                  P@1   P@5   P@10  |  P@1    P@5   P@10
Wiki embeddings
-----
Procrustes - CSLS 63.7  78.6  81.1     56.3   76.2  80.6

as well as the following 2 rows of Table 3 in the same paper

                 English to Italian | Italian to English
                  P@1   P@5   P@10  |  P@1    P@5   P@10
Procrustes - NN   42.6  54.7  59.0  |  53.5   65.5  69.5
Procrustes - CSLS 66.1  77.1  80.7  |  69.5   79.6  83.5

For Table 2, I am able to get these exact results

                 English to Italian 
                  P@1   P@5   P@10 
Wacky embeddings
----
Procrustes - CSLS 44.9  61.8  66.6 

by running [1], however, I struggle to get the Italian to English results here. In addition, when I change the embeddings to the wiki ones by running [2], I get these results

                 English to Italian 
                  P@1   P@5   P@10 
Wiki embeddings
----
Procrustes - CSLS 66.2  80.6  84.4

which are significantly higher than the ones in the paper.

For Table 3, I have tried using both

embeddings and have tried to center or renorm the vectors, as well as training on the expert or pseudo dictionaries, but I am unable to reproduce the results. Could you help me out?

(The latter embeddings are used in Smith et al. 2017 and were gotten from here. The MUSE paper suggests those were the ones used to produce the results.)

[1]

 python -m ipdb supervised_multiview.py --src_lang en --tgt_lang it --src_emb data/EN.200K.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt --tgt_emb data/IT.200K.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt --cuda 0 --dico_train data/crosslingual/dictionaries/OPUS_en_it_europarl_train_5K.txt --dico_eval data/crosslingual/dictionaries/OPUS_en_it_europarl_test.txt

[2]

 python -m ipdb supervised_multiview.py --src_lang en --tgt_lang it --src_emb data/wiki.it.txt --tgt_emb data/wiki.en.txt --cuda 0 --dico_train data/crosslingual/dictionaries/OPUS_en_it_europarl_train_5K.txt --dico_eval data/crosslingual/dictionaries/OPUS_en_it_europarl_test.txt

With thanks, Kamen