facebookresearch / UnsupervisedMT

Phrase-Based & Neural Unsupervised Machine Translation
Other
1.51k stars 262 forks source link

I have got an error when run.sh #71

Open prigioni opened 5 years ago

prigioni commented 5 years ago

Aligning embeddings with MUSE... Impossible to import Faiss library!! Switching to standard nearest neighbors search implementation, this will be significantly slower.

INFO - 03/18/19 09:55:37 - 0:00:00 - ============ Initialized logger ============ INFO - 03/18/19 09:55:37 - 0:00:00 - cuda: True dico_build: S2T&T2S dico_eval: default dico_max_rank: 10000 dico_max_size: 0 dico_method: csls_knn_10 dico_min_size: 0 dico_threshold: 0 dico_train: identical_char emb_dim: 300 exp_id: wiki-released-enfr-identical_char exp_name: alignments exp_path: /unsupervisedMT/PBSMT/MUSE/alignments/wiki-released-enfr-identical_char export: pth max_vocab: 200000 n_refinement: 5 normalize_embeddings: seed: -1 src_emb: /unsupervisedMT/PBSMT/data/embeddings/cc.en.300.vec src_lang: en tgt_emb: /unsupervisedMT/PBSMT/data/embeddings/cc.fr.300.vec tgt_lang: fr verbose: 2 INFO - 03/18/19 09:55:37 - 0:00:00 - The experiment will be stored in /unsupervisedMT/PBSMT/MUSE/alignments/wiki-released-enfr-identical_char INFO - 03/18/19 09:55:53 - 0:00:16 - Loaded 200000 pre-trained word embeddings. INFO - 03/18/19 09:56:18 - 0:00:41 - Loaded 200000 pre-trained word embeddings. INFO - 03/18/19 09:56:22 - 0:00:45 - Found 67029 pairs of identical character strings. INFO - 03/18/19 09:56:24 - 0:00:46 - Validation metric: mean_cosine-csls_knn_10-S2T-10000 INFO - 03/18/19 09:56:24 - 0:00:46 - Starting iteration 0... Traceback (most recent call last): File "/unsupervisedMT/PBSMT/MUSE/supervised.py", line 101, in evaluator.all_eval(to_log) File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/evaluator.py", line 217, in all_eval self.word_translation(to_log) File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/evaluator.py", line 120, in word_translation dico_eval=self.params.dico_eval File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/word_translation.py", line 92, in get_word_translation_accuracy dico = load_dictionary(path, word2id1, word2id2) File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/word_translation.py", line 49, in load_dictionary assert os.path.isfile(path) AssertionError

glample commented 5 years ago

Did you download the dictionaries with the get-data script?

prigioni commented 5 years ago

Did you download the dictionaries with the get-data script?

I just run run.sh. There are not get-data in PBSMT dir.

glample commented 5 years ago

This is MUSE code so the script will be in MUSE directory. Check https://github.com/facebookresearch/MUSE#get-evaluation-datasets Dictionaries are here: https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz

prigioni commented 5 years ago

This is MUSE code so the script will be in MUSE directory. Check https://github.com/facebookresearch/MUSE#get-evaluation-datasets Dictionaries are here: https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz

If train my data, I have to train dictionary and others?

glample commented 5 years ago

What do you mean by train your data? MUSE is to align pretrained embeddings. The dictionaries are provided.

prigioni commented 5 years ago

What do you mean by train your data? MUSE is to align pretrained embeddings. The dictionaries are provided.

I want to train my data, and how do I to prepare data for two mono language?