castorini / pygaggle

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
http://pygaggle.ai/
Apache License 2.0
339 stars 99 forks source link

collections.tar.gz cannot be found via dropbox url #289

Closed aandyw closed 10 months ago

aandyw commented 2 years ago

I'm trying to download the collections.tar.gz file through dropbox but it seems to be deleted.

wget https://www.dropbox.com/s/m1n2wf80l1lb9j1/collection.tar.gz

encounters an ERROR 404.

lintool commented 2 years ago

Hi @Pie31415 can you show us the landing page this URL comes from? I can't tell whose dropbox account owns this file?

aandyw commented 2 years ago

The dropbox url is from the "Data Prep" of the readme https://github.com/castorini/pygaggle/blob/master/docs/experiments-monot5-gpu.md

cd ${DATA_DIR}
wget https://storage.googleapis.com/duobert_git/run.bm25.dev.small.tsv
wget https://raw.githubusercontent.com/castorini/anserini/master/src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt
wget https://raw.githubusercontent.com/castorini/anserini/master/src/main/resources/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt
wget https://www.dropbox.com/s/m1n2wf80l1lb9j1/collection.tar.gz
tar -xvf collection.tar.gz
rm collection.tar.gz
mv run.bm25.dev.small.tsv run.dev.small.tsv
cd ../../
ahadda5 commented 1 year ago

@Pie31415 did you find it?

ahadda5 commented 1 year ago

Can get from msmarco directly