EleutherAI / pilev2

MIT License
13 stars 9 forks source link

ConvoKit Datasets #14

Open upintheairsheep opened 1 year ago

upintheairsheep commented 1 year ago

https://zissou.infosci.cornell.edu/convokit/datasets/ ConvoKit provides tons of datasets: casino-corpus/ 10-Sep-2021 04:27 - chromium-corpus/ 13-Nov-2019 21:27 - conversations-gone-awry-cmv-corpus/ 10-Aug-2020 14:33 - conversations-gone-awry-corpus/ 31-Jul-2020 16:43 - diplomacy-corpus/ 11-May-2020 16:39 - friends-corpus/ 22-Jul-2020 09:56 - gap-corpus/ 04-Jul-2020 10:14 - iq2-corpus/ 21-Oct-2019 19:31 - movie-corpus/ 11-Mar-2020 03:02 - parliament-corpus/ 16-Dec-2019 06:58 - persuasionforgood-corpus/ 17-Oct-2019 20:50 - reddit-coarse-discourse-corpus/ 17-Oct-2019 17:41 - spolin-corpus/ 21-Jul-2022 13:04 - stack-exchange-politeness-corpus/ 23-Apr-2020 23:41 - subreddit-corpus/ 13-Nov-2019 21:35 - supreme-corpus/ 20-Dec-2020 01:56 - supreme-corpus-deprecated/ 06-Nov-2019 20:56 - switchboard-corpus/ 26-Jun-2021 22:40 - tennis-corpus/ 06-Nov-2019 20:56 - wiki-articles-for-deletion-corpus/ 19-Feb-2021 02:17 - wiki-corpus/ 06-Nov-2019 20:57 - wiki-politeness-annotated-corpus/ 16-Dec-2019 05:54 - wiki-sampled-en-corpus/ 31-Dec-2021 02:54 - wiki-sampled-zh-corpus/ 31-Dec-2021 02:53 - wikiconv-corpus/ 30-Jun-2019 03:39 - wikipedia-politeness-corpus/ 23-Apr-2020 23:41 - winning-args-corpus/

All of these corpuses, especially the Subreddit one (which includes probably the entirety of the Reddit website) would be great additions to the site, probably we should make Subreddit one seperate from the rest due to it's VAST size.