Closed learnercat closed 5 years ago
Can you add: print(line)
just above the line that fails to see what line[2]
contains?
That's because one of the urls (https://dl.fbaipublicfiles.com/arrival) in download_evaluation.sh returns access denied, and doesn't let you download the monolingual evaluation tasks. You'll need to download them from somewhere else.
@afshinrahimi Hi, I meet the same error. Could you please give more details to fix this?
There are some files in data/monolingual/en that if you check you'd see they're not downloaded correctly (the file exists but the content is wrong/access denied). You can download most of those files from https://github.com/benathi/word2gm/tree/master/evaluation_data/multiple_datasets or other places and replace them. This would fix this problem. Checked it today.
Sorry we are facing some issues with our server. It triggers some security when too many curl calls are made. I'll compress the data and provide a link soon.
cd data/
wget https://dl.fbaipublicfiles.com/arrival/vectors.tar.gz
wget https://dl.fbaipublicfiles.com/arrival/wordsim.tar.gz
wget https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz
will provide all the data contained in the repo.
Hi @glample , it seems that the files in the monolingual folder have not been uploaded, which is the key to ValueError. So I think the issue should not be closed.
@afshinrahimi Hi, I tried the link you provided. However, it still doesn't work because data/monolingual/en/EN_SIMLEX-999.txt is not provided.
There are some stuff here as well https://github.com/mfaruqui/eval-word-vectors/tree/master/data/word-sim just need to change a dash to an underscore in the filename.
@afshinrahimi Thank you very much. Now the en works well but es is still not ok so supervised.py cannot run.
@1049451037 EN_SIMLEX-999.txt is in: https://dl.fbaipublicfiles.com/arrival/wordsim.tar.gz
This is what the archive contains:
de:
DE_GUR350.txt
DE_GUR65.txt
DE_SEMEVAL17.txt
DE_SIMLEX-999.txt
DE_WS-353.txt
DE_ZG222.txt
en:
EN_MC-30.txt
EN_MTurk-287.txt
EN_RG-65.txt
EN_SEMEVAL17.txt
EN_VERB-143.txt
EN_WS-353-REL.txt
EN_YP-130.txt
EN_MEN-TR-3k.txt
EN_MTurk-771.txt
EN_RW-STANFORD.txt
EN_SIMLEX-999.txt
EN_WS-353-ALL.txt
EN_WS-353-SIM.txt
questions-words.txt
es:
ES_MC-30.txt
ES_RG-65.txt
ES_SEMEVAL17.txt
ES_WS-353.txt
fa:
FA_SEMEVAL17.txt
fr:
FR_RG-65.txt
it:
IT_SEMEVAL17.txt
IT_SIMLEX-999.txt
IT_WS-353.txt
@glample Oh, thanks! The file name is a bit confusing because there is a wordsim folder in the crosslingual folder.
@glample, Thank you so much. It was missing data in monolingual and /crosslingual/wordsim downloading. I got those data from @glample; wget https://dl.fbaipublicfiles.com/arrival/vectors.tar.gz wget https://dl.fbaipublicfiles.com/arrival/wordsim.tar.gz wget https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz Finally it has worked.
OK, I think the problem is the directory names which has the nessesary data. After downloading the three *.tar.gz files, need to create these directories:
and also need to move the downloaded and extracted files into them correctly.
Hi I am a beginner of MUSE. I tried to trained unsupervised training by using Japanese and English pre-trained word vectors. For Japanese I cleaned a collection of Japanese text with MeCab and embedded in fastText (300d). For English I took pre-trained word vectors crawl-300d-2M.vec.zip: 2 million word vectors trained on Common Crawl (600B tokens) from fastText. Here is a command to train the model at GPU environment:
CUDA_VISIBLE_DEVICES=1,2 python unsupervised.py --src_lang ja --tgt_lang en --src_emb /item_embdd/skipgram/allgenre_model.vec --tgt_emb /pretrained_vec/en/crawl-300d-2M.vec 2> error20190214a.txt
I got the error messages as below:Traceback (most recent call last): File "unsupervised.py", line 139, in <module> evaluator.all_eval(to_log) File "/multi_embedd/MUSE/src/evaluation/evaluator.py", line 215, in all_eval self.monolingual_wordsim(to_log) File "/multi_embedd/MUSE/src/evaluation/evaluator.py", line 49, in monolingual_wordsim ) if self.params.tgt_lang else None File "/multi_embedd/MUSE/src/evaluation/wordsim.py", line 105, in get_wordsim_scores coeff, found, not_found = get_spearman_rho(word2id, embeddings, filepath, lower) File "/multi_embedd/MUSE/src/evaluation/wordsim.py", line 69, in get_spearman_rho word_pairs = get_word_pairs(path) File "/multi_embedd/MUSE/src/evaluation/wordsim.py", line 39, in get_word_pairs word_pairs.append((line[0], line[1], float(line[2]))) ValueError: could not convert string to float: 'encoding="utf-8"?>'
Could anyone give me advice or comment? Thanks in advance.