facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

Assertion error in supervised.py #138

Closed postrou closed 5 years ago

postrou commented 5 years ago

Hi. I'm trying to allign kazakh and english fastText embeddings with dictionary in supervised way: python3 supervised.py --src_lang kk --tgt_lang en --src_emb /data/kk_en/data/embeddings/fastText/cc.kk.300.vec --tgt_emb /data/kk_en/data/embeddings/fastText/cc.en.300.vec --n_refinement 4 --dico_train /data/kk_en/data/para/kk_en.dict.train --dico_eval /data/kk_en/data/para/kk_en.dict.val --exp_path /data/kk_en/data/embeddings/MUSE/experiments --exp_id full_words --export txt And I keep getting error:

Traceback (most recent call last):
  File "supervised.py", line 101, in <module>
    evaluator.all_eval(to_log)
  File "/data/github/MUSE/src/evaluation/evaluator.py", line 215, in all_eval
    self.monolingual_wordsim(to_log)
  File "/data/github/MUSE/src/evaluation/evaluator.py", line 49, in monolingual_wordsim
    ) if self.params.tgt_lang else None
  File "/data/github/MUSE/src/evaluation/wordsim.py", line 105, in get_wordsim_scores
    coeff, found, not_found = get_spearman_rho(word2id, embeddings, filepath, lower)
  File "/data/github/MUSE/src/evaluation/wordsim.py", line 69, in get_spearman_rho
    word_pairs = get_word_pairs(path)
  File "/data/github/MUSE/src/evaluation/wordsim.py", line 36, in get_word_pairs
    assert len(line) > 3
AssertionError

I've already seen issues #122 and #99, but looks like the problem there was solved by downloading data correctly. I've built my data manually (except fastText embs), so this doesn't help me. Can you explain plz the meaning of this assertion?

postrou commented 5 years ago

The problem was solved be deleting dirs crosslingual and monolingual.