dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

STORIES corpus #598

Closed szha closed 5 years ago

szha commented 5 years ago

Here's a cleaned-up corpus from Trinh et. al., consisting of stories https://console.cloud.google.com/storage/browser/commonsense-reasoning/reproduce/stories_corpus?pli=1 https://arxiv.org/abs/1806.02847

Given the effectiveness of large and clean corpora with deep transformer models, shown in BERT and GPT-2, this might be useful to others. Should we offer the automatic download of this corpus in GluonNLP?

eric-haibin-lin commented 5 years ago

+1

szha commented 5 years ago

Tracked in the above issue.