facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.17k stars 544 forks source link

Bad outcome in ja-en task #184

Open josaphjosta opened 3 years ago

josaphjosta commented 3 years ago

Using provided wiki.ja.vec and wiki.en.vec, so do the dictionaries. But the words precision seems strange:

INFO - 05/11/21 17:49:31 - 0:07:19 - 1451 source words - nn - Precision at k = 1: 0.000000 INFO - 05/11/21 17:49:31 - 0:07:19 - 1451 source words - nn - Precision at k = 5: 0.000000 INFO - 05/11/21 17:49:31 - 0:07:19 - 1451 source words - nn - Precision at k = 10: 0.137836

More info at train.log

Please help.

williammulianto commented 3 years ago

Hi, did you already try using common crawl embedding instead of wikipedia?

The Japanese wikipedia embedding representasion is not really meaningful See : https://github.com/facebookresearch/fastText/issues/710

Also try decreasing the epoch size to 250k/500k. If all above doesn't work, please check this paper, in this paper they improve EN-JP alignment precision by 30%

Hope this works, please correct me if im wrong.