Data and code for "A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications" by Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy and Roy Schwartz, NAACL 2018
PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.
We structured the dataset into sections each corresponding to a venue or an arxiv category, e.g., ./data/acl_2017 and ./data/arxiv.cs.cl_2007-2017. Each section is further split into the train/dev/test splits (same splits used in the paper). Due to licensing constraints, we provide instructions for downloading the data for some sections instead of including it in this repository, e.g., ./data/nips_2013-2017/README.md.
In order to experiment with (and hopefully improve) our models for aspect prediction and for predicting whether a paper will be accepted, see ./code/README.md.
Run ./setup.sh
at the root of this repository to install dependencies and download some of the larger data files not included in this repo.
@inproceedings{kang18naacl,
title = {A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications},
author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz},
booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)},
address = {New Orleans, USA},
month = {June},
url = {https://arxiv.org/abs/1804.09635},
year = {2018}
}