castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.57k stars 349 forks source link

Add to onboarding reproduction logs #1905

Closed bilet-13 closed 1 month ago

bilet-13 commented 1 month ago

Environment:

MacBook Pro 2023 Python 3.12 Java 21 Anaconda

Status: Everything worked except downloading the MS MARCO passage dataset.

Error: ERROR 409: Public access is not permitted on this storage account.

Solution: Use the mirror link to download the dataset: wget https://www.dropbox.com/s/9f54jg2f71ray3b/collectionandqueries.tar.gz -P collections/msmarco-passage