Closed BieniekAlexander closed 5 years ago
As you guys know, I used the ArXiv API to collect a paper dataset for this project. I'm gonna rerun this now to collect more papers. I also just found this page on bulk access on ArXiv, but I don't really have time to read into it right now. Maybe I'll use this soon, if our project timing allows it. We're probably fine with the data from the Deep Keyphrase Generation paper, though.
Deep Keyphrase Generation seeks to achieve basically exactly what we want. The paper shares the dataset they use, which can be found in the README of their github page.
The data includes paper titles, abstracts, and keyphrases. Note that this doesn't include subjects of the papers, and the dataset basically only contains CS papers.