Maluuba / newsqa

Tools for using Maluuba's NewsQA Dataset (public version)
https://www.microsoft.com/en-us/research/project/newsqa-dataset/
Other
253 stars 58 forks source link

NewsQA download failure #30

Closed rickiepark closed 5 years ago

rickiepark commented 5 years ago

Download link for the entire dataset in https://msropendata.com/datasets/939b1042-6402-4697-9c15-7a28de7e1321 returns error message like below. Please help me...

AuthenticationFailed Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:17a4519b-101e-0057-0181-68563e000000 Time:2019-09-11T09:13:30.8395781Z Signature did not match. String to sign used was rl 2019-09-11T09:08:23Z 2019-10-11T09:13:23Z /blob/msropendataset01/$root 2018-03-28
juharris commented 5 years ago

Did you attempt to go to the link in your browser? I'm talking with that team to try to make sure it's downloadable in the browser in the future. For now that page should give you the download instructions:

Dataset: NewsQA
Note that the dataset download link cannot be used directly in a browser
How do I use a download link for an entire dataset?
A download link for an entire dataset provides the location of the dataset in Azure as well as a special time-limited key that allows you to download the entire dataset. This link can be used with tools that can copy files from Azure, like the following:

AzCopy - a command-line tool for Windows or Linux that copies files to and from Azure.
Azure Storage Explorer - a utility that is used to manage Azure storage.
<Then you get a URL for the dataset which will not work in the browser>

For now, to get the dataset, get AzCopy and run:

azcopy cp --recursive <put that URL from before here> downloaded_newsqa
cp downloaded_newsqa/newsqa/newsqa-data-v1.csv ~/workspace/newsqa/maluuba/newsqa

The rest of the setup instructions should work (mostly) fine, except when using Docker. I'll update the setup instructions if the dataset download doesn't get changed soon. If you're using the Docker container:

# Notice that I am giving a specific command to not let the default command run because the current default command would delete newsqa-data-v1.csv.
docker run --rm -it -v ${PWD}:/usr/src/newsqa --name newsqa maluuba/newsqa /bin/bash --login -c "cp --no-clobber /usr/downloads/* maluuba/newsqa/ && python -m unittest discover ."
rickiepark commented 5 years ago

Thank you @juharris ,

I misunderstood the instructions. :( I'll try again with azcopy.

Thank you so much for quick response. 👍

juharris commented 5 years ago

No worries. I actually did the same thing and ignored their instructions the first time too =)

rickiepark commented 5 years ago

I successfully downloaded the dataset in azure vm. Thank you so much. :)