Closed nxphi47 closed 5 years ago
I got the same error. I tried to add --no-check-certificate
in download_data
# Download data
download_data() {
CORPORA=$1
URL=$2
if [ -f $CORPORA ]; then
echo "$CORPORA already exists, skipping download"
else
echo "Downloading $URL"
wget --no-check-certificate $URL -O $CORPORA || rm -f $CORPORA
if [ -f $CORPORA ]; then
echo "$URL successfully downloaded."
else
echo "$URL not successfully downloaded."
rm -f $CORPORA
exit -1
fi
fi
}
However, I couldn't download the data correctly. I think the server, anoopk.in, has some problems.
The server seems to be back again. Please reopen in case you are still observing this issue
The server seems to change the directory which placed the data.
I found this: http://www.cfilt.iitb.ac.in/~moses/iitb_en_hi_parallel/dataset.html
It requires to input some information to download the data. After submitting the form, we can download it, but the URL is not the same as download-data.sh assumed.
If we will access to the URL which has ~anoopk, but not ~moses (https://www.cse.iitb.ac.in/~anoopk/share/iitb_en_hi_parallel/iitb_corpus_download/parallel.tgz
), it will redirect to https://anoopk.in/share/iitb_en_hi_parallel/iitb_corpus_download/parallel.tgz
but the anoopk.in server is not stable and still can't download parallel.tgt from there, and even if --no-check-certificate
option is added, the downloaded file might not be the correct one.
Okay, I checked the code. https://github.com/facebookresearch/flores/blob/f9f84a239bb6fa9e0168e6faaead93921d56a85a/download-data.sh#L155
This new URL seems to work. Thanks!
Thank you for this project and the paper.
I have issue with bash download-data.sh
I think the error happens at line 155 when it tries to download the file
https://anoopk.in/share/iitb_en_hi_parallel/iitb_corpus_download/parallel.tgz
Using web browser, the link appears to be dead.
The line:
download_data $DATA/en-hi.tgz "https://www.cse.iitb.ac.in/~anoopk/share/iitb_en_hi_parallel/iitb_corpus_download/parallel.tgz"
Thank you,