Closed StellaAthena closed 4 years ago
I think we decided not to use the dialogpt script, since they do a bunch of processing we don't need
I must have missed that memo, thanks for the correction.
If interested: massive data dumps are at https://pushshift.io/api-parameters/ . The guy who runs it is really into cool research projects, so he'd be able to put together any kind of subset we might want.
I believe we talked about using pushshift but it has severe throttling? @leogao2 @Mistobaan what was the deal with that?
I haven't been involved in it.
@anishthite has uploaded the processing code here: https://github.com/EleutherAI/reddit_comment_processing
Once he provides a link to the URL the data is at this will be moved to “ready to merged.”
priority: medium, shouldnt be too hard since we can use their code