allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
314 stars 42 forks source link

args.me corpus #127

Closed janheinrichmerker closed 2 years ago

janheinrichmerker commented 2 years ago

Dataset Information:

The args.me corpus comprises 387 740 arguments. They are crawled from the debate portals Debatewise (14 353 arguments), IDebate.org (13 522 arguments), Debatepedia (21 197 arguments), and Debate.org (338 620 arguments). Moreover, the corpus contains 48 arguments from Canadian Parliament discussions. The arguments are extracted using heuristics that are designed for each debate portal.

Links to Resources:

Dataset ID(s):

argsme: Whole corpus

Maybe it might be worth adding entries for the five subsets the corpus was crawled from.

Supported Entities

Additional comments/concerns/ideas/etc.

The corpus (or variants of it) is also used in the Touché shared task series:

seanmacavaney commented 2 years ago

Thanks @heinrichreimer! Touché was already on my todo list, so this fits in nicely with that. Currently in progress.

janheinrichmerker commented 2 years ago

I'd be glad to help you with that! Maybe I can even try to implement it by myself and ask you in case of any questions.

seanmacavaney commented 2 years ago

That would be awesome! Let me know if you have any questions.

I think we'll need a third version of the corpus for Touché 2022 because it's based on another version of the corpus using pairs of sentences.