WikiCLIR - Githubissues

Dataset Information:

A CLIR dataset based on Wikipedia. A precursor to clirmatrix.

Links to Resources:

Dataset ID(s) & supported entities:

Checklist

Mark each task once completed. All should be checked prior to merging a new dataset.

[x] Dataset definition (in ir_datasets/datasets/[topid].py)
[x] Tests (in tests/integration/[topid].py)
[x] Metadata generated (using ir_datasets generate_metadata command, should appear in ir_datasets/etc/metadata.json)
[x] Documentation (in ir_datasets/etc/[topid].yaml)
- [x] Documentation generated in https://github.com/seanmacavaney/ir-datasets.com/
[x] Downloadable content (in ir_datasets/etc/downloads.json)
- [x] Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid.
- [x] Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in downloads.json.

Additional comments/concerns/ideas/etc.

allenai / ir_datasets