EleutherAI / the-pile

MIT License
1.51k stars 128 forks source link

ConvoKit datasets #104

Closed upintheairsheep closed 1 year ago

upintheairsheep commented 1 year ago

Can you integrate the ConvoKit datasets, especially the giant Reddit dataset into the pile, or a future version of the pile? I would really would like to bring AI further for all of humanity, not for the purpose of feeding the pigs (cooperations). https://zissou.infosci.cornell.edu/convokit/datasets/ See https://convokit.cornell.edu/documentation/datasets.html

upintheairsheep commented 1 year ago

http://cairo.lti.cs.cmu.edu/~hector/ - A similar dataset hosting ~0.5GB of Twitter tweets, ~0.3 GB dbpedia data and an unknown amount of wikihow xml files

upintheairsheep commented 1 year ago

pile v2