allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
306 stars 40 forks source link

MSMARCO URLs moved to another domain #256

Closed TheMrSheldon closed 4 months ago

TheMrSheldon commented 5 months ago

Describe the bug The download-URLs for MSMARCO seem to have been moved to a new domain. For example the test queries for TREC DL '19 Passage where previously at https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-test2019-queries.tsv.gz and can now be found at https://msmarco.z22.web.core.windows.net/msmarcoranking/msmarco-test2019-queries.tsv.gz (see also here: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2019.html).

From a precursory glance, I believe that replacing https://msmarco.blob.core.windows.net/ with https://msmarco.z22.web.core.windows.net/ in ir_datasets/etc/downloads.json should fix this.

Affected dataset(s) At least msmarco-passage.

To Reproduce Access msmarco-passage/train.

seanmacavaney commented 5 months ago

Thanks for the report! It seems the links have changed very recently, per the tracking here: https://ir-datasets.com/downloads

seanmacavaney commented 5 months ago

Version 0.5.6 updates the URLs, so you should be good to go* after updating!

pip install --upgrade ir-datasets==0.5.6

* Should work for everything except msmarco-qna -- which seem to have been removed completely. I've opened an issue about it.

TheMrSheldon commented 5 months ago

Thank you very much for your hard work!

TheMrSheldon commented 4 months ago

I see that the msmarco-qna issue is resolved as well and consider this resolved