Data and code for the ACL 2019 paper Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model.
Preprocessed, but not truncated, data Preprocessed, truncated, data Raw data (only replaced \n with "NEWLINE_CHAR" and appended "|||||" to the end of each story). Raw data, bad retrievals removed -- Removes documents retrieved with error noticed in this issue and removes the "|||||" at the end of each example. Raw data -- zipped Tensorflow datasets