codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.06k stars 2.11k forks source link

Good news links or good ways for getting more than a year news #593

Open nanaya07 opened 6 years ago

nanaya07 commented 6 years ago

Hi, I am currently working on machine learning project. I decided to use newspaper3k library to get articles by dates. I use cnn.com, nytimes.com, and fox.com to get articles. However, they usually provide only few month recent news.

Is there good way to get more than one year news?

agnelvishal commented 5 years ago

You can use http://commoncrawl.org/ api. An example is available at https://github.com/agnelvishal/Condense.press/tree/master/backend/cdx-index-client-master

agnelvishal commented 5 years ago

I already have the dataset and am in need of intern. Can you help?

nanaya07 commented 5 years ago

I am already done with the project. I just used more news websites. Thank you.