EleutherAI / the-pile

MIT License
1.46k stars 126 forks source link

Debate notes #56

Closed Hellisotherpeople closed 3 years ago

Hellisotherpeople commented 3 years ago

I have this dataset which consists of a large amount of documents compiled by competitive debaters as evidence, along-side extractive and abstractive summaries. Can the documents portion be included in this dataset? I have ~180K documents.

https://github.com/Hellisotherpeople/DebateSum

StellaAthena commented 3 years ago

Hi! Approximately how large is your dataset (in GB, not documents)? And what language(s) is it in?

Hellisotherpeople commented 3 years ago

Slightly less than 1GB, English

StellaAthena commented 3 years ago

That sounds great! Thanks for bringing it to our attention. If you’d like to contribute it, feel free to submit a PR. Otherwise I’ll put it on the list of things we need to get around to doing.

Hellisotherpeople commented 3 years ago

Is this being included in a future release of the pile? I haven't had a lot of time to spend on this recently but I can try to get it in very soon if there is some kind of time-limit...

StellaAthena commented 3 years ago

Is this being included in a future release of the pile? I haven't had a lot of time to spend on this recently but I can try to get it in very soon if there is some kind of time-limit...

We are not currently working on a Pile V2 or similar. So no.