abisee / cnn-dailymail

Code to obtain the CNN / Daily Mail dataset (non-anonymized) for summarization
MIT License
632 stars 306 forks source link

Titles of articles #21

Open aburkov opened 6 years ago

aburkov commented 6 years ago

Hi,

In the *.story files the titles of the news articles are absent. Is there a way to get the titles?

the-black-knight-01 commented 5 years ago

url_list contains all orginal link. you can get all link from there. hash code of *.story is generated from url. Example: https://www.browserling.com/tools/text-to-hex
become this. 000efdbb001fd19666b37456e239c78c52908655

JafferWilson commented 5 years ago

Try my repository and make it run: https://github.com/abisee/cnn-dailymail#option-1-download-the-processed-data