Closed narnoura closed 5 years ago
Hi Noura, I think you could just comment out line 227 under src/evaluation/evaluator.py if you haven't a dictionary. In my experiment, there is no error. Hope that it can help you.
Hello, I did that, commenting out only line 227, and kept getting this error below. Do you know what it means? Thanks.
Traceback (most recent call last):
File "supervised.py", line 100, in
Sorry, did you mean line 227 or line 217? Will try 227 now, thanks.
-If commenting line 227, I get this error: File "/proj/nlpdisk3/nlpusers/noura/deep-learning/Experiments/Embeddings/MUSE/src/evaluation/word_translation.py", line 92, in get_word_translation_accuracy dico = load_dictionary(path, word2id1, word2id2) File "/proj/nlpdisk3/nlpusers/noura/deep-learning/Experiments/Embeddings/MUSE/src/evaluation/word_translation.py", line 57, in load_dictionary with io.open(path, 'r', encoding='utf-8') as f: FileNotFoundError: [Errno 2] No such file or directory: '/proj/nlpdisk3/nlpusers/noura/deep-learning/Experiments/Embeddings/MUSE/src/evaluation/../../data/crosslingual/dictionaries/en-ug.5000-6500.txt'
-If commenting line 227 with a dummy empty dictionary file, I get this error: File "/proj/nlpdisk3/nlpusers/noura/deep-learning/Experiments/Embeddings/MUSE/src/evaluation/word_translation.py", line 95, in get_word_translation_accuracy assert dico[:, 0].max() < emb1.size(0) IndexError: too many indices for tensor of dimension 1
Sorry, this is my mistake. I mean you can just comment out line 217under src/evaluation/evaluator.py
Is there still a problem with this?
Hi Crystal, yes I tried commenting out line 217 and I get this error:
if to_log[metric] > self.best_valid_metric: KeyError: 'precision_at_1-csls_knn_10'
Thanks, Noura
Oh, if convenient, you can share the training commands with me.
In addition, what is the 'VALIDATION_METRIC' parameter of the function 'save_best()' ?
It's 'precision_at_1-csls_knn_10' (this was the default, I didn't change it)
This is the training command I used in the unsupervised case:
python unsupervised.py --src_lang en --tgt_lang ug --src_emb $embed_dir/en.mono.txt --tgt_emb $embed_dir/ug.mono.txt --n_refinement 5 --dico_build "S2T|T2S" --exp_path $dir/en-$lang/
If convenient, I think you could modify the value of '--dico_build ' to "S2T" (default value).
That doesn't work either because I get an error due to empty dictionary size - that's why I changed it to S2T|T2S.
I am also having the same error now. Line 227 in evaluator.py is: tgt_preds = [] - should I really be commenting this one? (Line 217 is: self.word_translation(to_log))
Okay, commenting out line 217, but using the modified unsupervised.py in this pull request (https://github.com/facebookresearch/MUSE/pull/97) worked without errors for me to build the multilingual embeddings for a low resource scenario.
Hi there, I wanted to build embeddings without evaluation too and was using the modified unsupervised.py in the pull request (#97). However I am facing with the error:
Traceback (most recent call last):
File "unsupervised.py", line 141, in
My command is: python unsupervised.py --src_lang en --tgt_lang zh --src_emb data/src_emb_en.txt --tgt_emb data/tgt_emb_zh.txt --n_refinement 5 --normalize_embeddings center --emb_dim 512 --cuda False --dis_most_frequent 0 --n_epochs 1 --epoch_size 100
Do you have any idea?
Hey sorry please ignore the above. I just figured it out it was only because I didn't feed enough data into it.
Hello - I have been training MUSE embeddings for a number of low-resource languages and I discovered that the model is being iteratively validated using an internal dictionary, even in the unsupervised case. I discovered this by coincidence when training models for Uyghur and Tigrinya, which do not have any 'pre-trained' dictionaries, and I got an error message from the evaluator, saying that it could not find the dictionary under: data/crosslingual/dictionaries/en-.5000-6500.txt
I also tried uncommenting lines 217 and 219 under src/evaluation/evaluator.py, but that gave me another error from the trainer. Could you advise on what the error means?
File "unsupervised.py", line 143, in
trainer.save_best(to_log, VALIDATION_METRIC)
File "/proj/nlpdisk3/nlpusers/noura/deep-learning/Experiments/Embeddings/MUSE/src/trainer.py", line 224, in save_best
if to_log[metric] > self.best_valid_metric:
KeyError: 'mean_cosine-csls_knn_10-S2T-10000'
I imagine that if I created a dummy dictionary file, the same thing would happen.
Thank you, Noura