facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 551 forks source link

Issues downloading data with get_evaluation.sh #149

Open kellymarchisio opened 4 years ago

kellymarchisio commented 4 years ago

FYI, there are some issues downloading data with get_evaluation.sh.

First, it's pretty easy to be rate-limited from https://dl.fbaipublicfiles.com/arrival. Furthermore, I get 'Access Denied' when trying to download the wordsim data (not rate limit) Second, http://alt.qcri.org/semeval2017/task2/data/uploads... is down so you can't get the semeval2017 data that way. Third, the two ways of getting data mentioned on the README aren't equivalent -- for instance, wget https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz give you different dictionaries (well, more) than running get_evaluation.sh.

Not currently causing me problems because I've now figured it out, but did trip me up for a little while so thought you should know.

Crescentz commented 4 years ago

I encountered the same problem, can you tell me how you solve it

Crescentz commented 4 years ago

also, ‘wiki.en-es.es.vec’ Do you know how to get it

kellymarchisio commented 4 years ago

I instead ran these instructions from the readme: cd data/ wget https://dl.fbaipublicfiles.com/arrival/vectors.tar.gz wget https://dl.fbaipublicfiles.com/arrival/wordsim.tar.gz wget https://dl.fbaipublicfiles.com/arrival/dictionaries.tar.gz

The dictionaries were in dictionaries.tar.gz. I'm using this with the unsupervised MT repo so I moved dictionaries/ to UnsupervisedMT/PBSMT/MUSE/data/crosslingual. I don't know where ‘wiki.en-es.es.vec’ is, sorry. Maybe you can get it somewhere from here

Alternatively you can add probably waits to get_evaluation.sh to slow how much data you're grabbing from https://dl.fbaipublicfiles.com/arrival. The semeval data is in wordsim.tar.gz. I wrote to the semeval authors though and they seem to have put their site back up. So you may have less issues now with get_evaluation.sh

hammedb197 commented 3 years ago

run this first to grant permission chmod u+x ./MUSE/data/get_evaluation.sh, you can read more about the use of chmod. I'll update the Readme file to reflect this