Open Oscar860601 opened 6 years ago
@Oscar860601
The original data is from here: https://github.com/danqi/rc-cnn-dailymail
The code to download them is here: https://github.com/deepmind/rc-data
Oh I meant anonymized summarization data. There are only non-anonymized summarization data and anonymized QA data from cnn-dailymail. I just wondering if there are open source code to obtain non-anonymized summarization data since it's widely used. Still thanks a lot.
The same dataset for QA was repurposed for summarization. If you look at generate_questions.py it should get you most of the way there.
@AlJohri Thanks! I will try to modify this code.
@abisee Did you wrote code for generating anonymized version of cnn-dailymail summarizaition dataset?