boudinfl / ake-datasets

Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Apache License 2.0
142 stars 29 forks source link

Adding KP20k and KPTimes #4

Closed ygorg closed 4 years ago

ygorg commented 4 years ago

Updated README. KP20k ids were updated (so test, valid and train ids don't overlap). References were added/updated, train reference is too large for github. It is created with _preprocess.sh.