[ ] Downloadable content (in ir_datasets/etc/downloads.json)
[ ] Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid.
[ ] Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in downloads.json.
Additional comments/concerns/ideas/etc.
The dataset is only available on request and after accepting a disclaimer. So it will be another semi-manual dataset with instructions provided for access.
Dataset Information:
A rather large dataset in Czech.
Links to Resources:
Dataset ID(s) & supported entities:
dareczech
(docs)dareczech/train
(docs, queries, qrels)dareczech/train/small
(docs, queries, qrels)dareczech/dev
(docs, queries, qrels)dareczech/test
(docs, queries, qrels)It appears to be a re-ranking dataset, so scorddocs will also likely be provided.
Checklist
Mark each task once completed. All should be checked prior to merging a new dataset.
ir_datasets/datasets/[topid].py
)tests/integration/[topid].py
)ir_datasets generate_metadata
command, should appear inir_datasets/etc/metadata.json
)ir_datasets/etc/[topid].yaml
)ir_datasets/etc/downloads.json
).github/workflows/verify_downloads.yml
). Only one needed pertopid
.downloads.json
.Additional comments/concerns/ideas/etc.
The dataset is only available on request and after accepting a disclaimer. So it will be another semi-manual dataset with instructions provided for access.