abisee / cnn-dailymail

Code to obtain the CNN / Daily Mail dataset (non-anonymized) for summarization
MIT License
635 stars 306 forks source link

New Test #29

Open quanghuynguyen1902 opened 5 years ago

quanghuynguyen1902 commented 5 years ago

if I have the content of article that is not of CNN or DM. How will I process data?

JafferWilson commented 5 years ago

You need to format your data according to the CNN or DM dataset. It will work. Else modify the tokenizing file according to your data. Thats the solution. It is not an issue as well.

Santosh-Gupta commented 5 years ago

It looks like we just need to put the source, followed by the summary in each line, separated by a unique token?