AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
MIT License
429 stars 37 forks source link

Good news links or good ways for getting more than a year news #225

Open AndyTheFactory opened 10 months ago

AndyTheFactory commented 10 months ago

Issue by nanaya07 Sat Jul 7 19:47:46 2018 Originally opened as https://github.com/codelucas/newspaper/issues/593


Hi, I am currently working on machine learning project. I decided to use newspaper3k library to get articles by dates. I use cnn.com, nytimes.com, and fox.com to get articles. However, they usually provide only few month recent news.

Is there good way to get more than one year news?

AndyTheFactory commented 10 months ago

Comment by agnelvishal Wed Nov 21 16:07:25 2018


You can use http://commoncrawl.org/ api. An example is available at https://github.com/agnelvishal/Condense.press/tree/master/backend/cdx-index-client-master

AndyTheFactory commented 10 months ago

Comment by agnelvishal Wed Nov 21 16:09:03 2018


I already have the dataset and am in need of intern. Can you help?

AndyTheFactory commented 10 months ago

Comment by nanaya07 Thu Nov 22 03:14:35 2018


I am already done with the project. I just used more news websites. Thank you.