facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

Lower precision in en-zh reproducing training #130

Closed learnercat closed 5 years ago

learnercat commented 5 years ago

Hello! I am trying on reproducing en-zh supervised training using fasText pre-embedded vectors with 35 refinement iteration. CUDA_VISIBLE_DEVICES=2 python supervised.py --src_lang en --tgt_lang zh --src_emb /MUSE/data/fasttextvec/wiki.en.vec --tgt_emb /MUSE/data/fasttextvec/wiki.zh.vec --n_refinement 35 --normalize_embeddings center After 35 iteration, each precision of NN and CSLS as below: Monolingual source word similarity score average: 0.65924 Found 2230 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2) 1500 source words - nn - Precision at k = 1: 37.266667 1500 source words - nn - Precision at k = 5: 53.933333 1500 source words - nn - Precision at k = 10: 59.533333 Found 2230 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2) 1500 source words - csls_knn_10 - Precision at k = 1: 38.666667 1500 source words - csls_knn_10 - Precision at k = 5: 56.400000 1500 source words - csls_knn_10 - Precision at k = 10: 61.333333 Building the train dictionary ... New train dictionary of 6201 pairs. Mean cosine (nn method, S2T build, 10000 max size): 0.57653 Building the train dictionary ... New train dictionary of 4986 pairs. Mean cosine (csls_knn_10 method, S2T build, 10000 max size): 0.59683` These precision are lower than the evaluation results in original paper. I found that precision didn't improve after 20 iterations. Could anyone give me an advice? Thanks in advance.

LuoRongLuoRong commented 2 years ago

@learnercat So, how did you deal with the problem that precision is low? Thanks in advance!