Obtaining -new- news each day

AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

MIT License

429 stars 37 forks source link

Issue by durakkerem Tue May 8 20:34:27 2018 Originally opened as https://github.com/codelucas/newspaper/issues/563

So I know that I can building a news site crawls over all available news of the website:

cnn_paper = newspaper.build('https://cnn.com')

But how about when I want to get only newest news? In my case default caching architecture of Newspaper3k does not work because I run a never-ending crawler and my cache will finally end up overflowing.

One other problem is I can obtain publish_date of articles only after downloading&parsing each news.

Is there a workaround for this case? I think it will be very useful for many users too.

AndyTheFactory / newspaper4k

Obtaining -new- news each day #206