Open prigioni opened 5 years ago
Did you download the dictionaries with the get-data script?
Did you download the dictionaries with the get-data script?
I just run run.sh. There are not get-data in PBSMT dir.
This is MUSE code so the script will be in MUSE directory. Check https://github.com/facebookresearch/MUSE#get-evaluation-datasets Dictionaries are here: https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz
This is MUSE code so the script will be in MUSE directory. Check https://github.com/facebookresearch/MUSE#get-evaluation-datasets Dictionaries are here: https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz
If train my data, I have to train dictionary and others?
What do you mean by train your data? MUSE is to align pretrained embeddings. The dictionaries are provided.
What do you mean by train your data? MUSE is to align pretrained embeddings. The dictionaries are provided.
I want to train my data, and how do I to prepare data for two mono language?
Aligning embeddings with MUSE... Impossible to import Faiss library!! Switching to standard nearest neighbors search implementation, this will be significantly slower.
INFO - 03/18/19 09:55:37 - 0:00:00 - ============ Initialized logger ============ INFO - 03/18/19 09:55:37 - 0:00:00 - cuda: True dico_build: S2T&T2S dico_eval: default dico_max_rank: 10000 dico_max_size: 0 dico_method: csls_knn_10 dico_min_size: 0 dico_threshold: 0 dico_train: identical_char emb_dim: 300 exp_id: wiki-released-enfr-identical_char exp_name: alignments exp_path: /unsupervisedMT/PBSMT/MUSE/alignments/wiki-released-enfr-identical_char export: pth max_vocab: 200000 n_refinement: 5 normalize_embeddings: seed: -1 src_emb: /unsupervisedMT/PBSMT/data/embeddings/cc.en.300.vec src_lang: en tgt_emb: /unsupervisedMT/PBSMT/data/embeddings/cc.fr.300.vec tgt_lang: fr verbose: 2 INFO - 03/18/19 09:55:37 - 0:00:00 - The experiment will be stored in /unsupervisedMT/PBSMT/MUSE/alignments/wiki-released-enfr-identical_char INFO - 03/18/19 09:55:53 - 0:00:16 - Loaded 200000 pre-trained word embeddings. INFO - 03/18/19 09:56:18 - 0:00:41 - Loaded 200000 pre-trained word embeddings. INFO - 03/18/19 09:56:22 - 0:00:45 - Found 67029 pairs of identical character strings. INFO - 03/18/19 09:56:24 - 0:00:46 - Validation metric: mean_cosine-csls_knn_10-S2T-10000 INFO - 03/18/19 09:56:24 - 0:00:46 - Starting iteration 0... Traceback (most recent call last): File "/unsupervisedMT/PBSMT/MUSE/supervised.py", line 101, in
evaluator.all_eval(to_log)
File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/evaluator.py", line 217, in all_eval
self.word_translation(to_log)
File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/evaluator.py", line 120, in word_translation
dico_eval=self.params.dico_eval
File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/word_translation.py", line 92, in get_word_translation_accuracy
dico = load_dictionary(path, word2id1, word2id2)
File "/unsupervisedMT/PBSMT/MUSE/src/evaluation/word_translation.py", line 49, in load_dictionary
assert os.path.isfile(path)
AssertionError