facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

AssertionError - in get_word_translation_accuracy dico = load_dictionary(path, word2id1, word2id2) #6

Closed jamsheer2u closed 6 years ago

jamsheer2u commented 6 years ago

I was trying to build cross-lingual word embeddings for Malayalam and Hindi.

Environment : Ubuntu 16, 8CPUs/52GB RAM, Tesla K80, Google Cloud, CUDA 8, Python 3.6, Faiss not installed

This is what I did,

curl -Lo data/wiki.ml.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.ml.vec
curl -Lo data/wiki.hi.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.hi.vec

Then

python unsupervised.py --src_lang ml --tgt_lang hi --src_emb ../data/wiki.ml.vec --tgt_emb ../data/wiki.hi.vec

After running it around 10 mins, I got this error,

INFO - 12/27/17 13:10:56 - 0:10:37 - 988000 - Discriminator loss: 0.4106 - 3290 samples/s INFO - 12/27/17 13:10:58 - 0:10:39 - 992000 - Discriminator loss: 0.4109 - 3339 samples/s INFO - 12/27/17 13:11:00 - 0:10:42 - 996000 - Discriminator loss: 0.4110 - 3344 samples/s Traceback (most recent call last): File "unsupervised.py", line 135, in <module> evaluator.all_eval(to_log) File "/home/jamsheer/jamsheer/fasttext/MUSE/src/evaluation/evaluator.py", line 190, in all_eval self.word_translation(to_log) File "/home/jamsheer/jamsheer/fasttext/MUSE/src/evaluation/evaluator.py", line 94, in word_translation method=method File "/home/jamsheer/jamsheer/fasttext/MUSE/src/evaluation/word_translation.py", line 88, in get_word_translation_accuracy dico = load_dictionary(path, word2id1, word2id2) File "/home/jamsheer/jamsheer/fasttext/MUSE/src/evaluation/word_translation.py", line 48, in load_dictionary assert os.path.isfile(path)AssertionError

glample commented 6 years ago

It looks like you did not provide a dictionary for this language pair. You will first need to create a dictionary Malayalam / Hindi and move it to MUSE/data/crosslingual/dictionaries so that the evaluation script can properly load it.

glample commented 6 years ago

Note that you could totally disable the word-translation evaluation and let the model run to build your unsupervised embeddings / dictionaries. In practice, we periodically evaluate the embeddings on a small dictionary to be sure that the model is running properly.

To do so, you can comment out the evaluations you do not want to run here: https://github.com/facebookresearch/MUSE/blob/master/src/evaluation/evaluator.py#L184-L192

In particular, you can just comment out the line self.word_translation(to_log)