Open GollapudiBhanu opened 3 years ago
Hi, thanks for the suggestion. Can you provide more information about the dataset you are requesting? I'm having trouble finding information about it. For instance, is there a paper published about it? What document corpus does it use? etc.
@GollapudiBhanu bump
Hi team,
Can you please guide me how to upload my data to IR_datasets, I tried well but had hard luck ??
I am doing TREC data set, I did my basic initial retrieval, as second step, I want to do reranking, but I am not able to upload my files to IR-datasets.
Thanks, Bhanu Prasad.G
On Tue, Aug 3, 2021 at 4:57 PM Sean MacAvaney @.***> wrote:
@GollapudiBhanu https://github.com/GollapudiBhanu bump
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/allenai/ir_datasets/issues/105#issuecomment-892192816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT3F3CACJT6RYBLFB6MEGVTT3BQ23ANCNFSM5BDAAHTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
Hi Bhanu,
Can you clarify your particular needs? If it's a standard benchmark (which task? which year?), we're happy to add it to the official library, see (a). If it's not a standard benchmark but you want it available from the package, you can write an extension package, see (b). Or if you want a one-off object that shares the same interfaces and are fine converting your data to a standard format, see (c).
(a) Adding a standard benchmark dataset:
There's a guide on this here: https://github.com/allenai/ir_datasets/blob/master/examples/adding_datasets.ipynb
(b) Adding an extension dataset:
Similar to the above, but not added to the main repository. See example here: https://github.com/seanmacavaney/dummy-irds-ext/
(c) Creating a one-off dataset:
If you just need a one-off dataset instance for your benchmark, the queries/docs are in TSV format, and the qrels are in TREC format, you can use create_dataset
as shown below:
import ir_datasets
dataset = ir_datasets.create_dataset(
docs_tsv="path/to/docs.tsv",
queries_tsv="path/to/queries.tsv",
qrels_trec="path/to/qrels.trec"
)
Hope this helps!
Hi @GollapudiBhanu,
Did the above help?
Dataset Information: