Hello! I am trying on reproducing en-zh supervised training using fasText pre-embedded vectors with 35 refinement iteration.
CUDA_VISIBLE_DEVICES=2 python supervised.py --src_lang en --tgt_lang zh --src_emb /MUSE/data/fasttextvec/wiki.en.vec --tgt_emb /MUSE/data/fasttextvec/wiki.zh.vec --n_refinement 35 --normalize_embeddings center
After 35 iteration, each precision of NN and CSLS as below:
Monolingual source word similarity score average: 0.65924
Found 2230 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
1500 source words - nn - Precision at k = 1: 37.266667
1500 source words - nn - Precision at k = 5: 53.933333
1500 source words - nn - Precision at k = 10: 59.533333
Found 2230 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
1500 source words - csls_knn_10 - Precision at k = 1: 38.666667
1500 source words - csls_knn_10 - Precision at k = 5: 56.400000
1500 source words - csls_knn_10 - Precision at k = 10: 61.333333
Building the train dictionary ...
New train dictionary of 6201 pairs.
Mean cosine (nn method, S2T build, 10000 max size): 0.57653
Building the train dictionary ...
New train dictionary of 4986 pairs.
Mean cosine (csls_knn_10 method, S2T build, 10000 max size): 0.59683`
These precision are lower than the evaluation results in original paper.
I found that precision didn't improve after 20 iterations.
Could anyone give me an advice?
Thanks in advance.
Hello! I am trying on reproducing en-zh supervised training using fasText pre-embedded vectors with 35 refinement iteration.
CUDA_VISIBLE_DEVICES=2 python supervised.py --src_lang en --tgt_lang zh --src_emb /MUSE/data/fasttextvec/wiki.en.vec --tgt_emb /MUSE/data/fasttextvec/wiki.zh.vec --n_refinement 35 --normalize_embeddings center
After 35 iteration, each precision of NN and CSLS as below: Monolingual source word similarity score average: 0.65924 Found 2230 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2) 1500 source words - nn - Precision at k = 1: 37.266667 1500 source words - nn - Precision at k = 5: 53.933333 1500 source words - nn - Precision at k = 10: 59.533333 Found 2230 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2) 1500 source words - csls_knn_10 - Precision at k = 1: 38.666667 1500 source words - csls_knn_10 - Precision at k = 5: 56.400000 1500 source words - csls_knn_10 - Precision at k = 10: 61.333333 Building the train dictionary ... New train dictionary of 6201 pairs. Mean cosine (nn method, S2T build, 10000 max size): 0.57653 Building the train dictionary ... New train dictionary of 4986 pairs. Mean cosine (csls_knn_10 method, S2T build, 10000 max size): 0.59683` These precision are lower than the evaluation results in original paper. I found that precision didn't improve after 20 iterations. Could anyone give me an advice? Thanks in advance.