castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.57k stars 349 forks source link

Pyserini download index doesn't actually appear to check tarball size #1886

Open lintool opened 2 months ago

lintool commented 2 months ago

Currently only checks MD5. We store the file size in the Dict, so it'd be easy to check.