codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.14k stars 2.12k forks source link

Obtaining -new- news each day #563

Open durakkerem opened 6 years ago

durakkerem commented 6 years ago

So I know that I can building a news site crawls over all available news of the website:

cnn_paper = newspaper.build('https://cnn.com')

But how about when I want to get only newest news? In my case default caching architecture of Newspaper3k does not work because I run a never-ending crawler and my cache will finally end up overflowing.

One other problem is I can obtain publish_date of articles only after downloading&parsing each news.

Is there a workaround for this case? I think it will be very useful for many users too.

jonkislin commented 5 years ago

@codelucas I have the same issue and I'm also interested in this functionality; Is there something @durakkerem and I may be missing?

Awesome library btw!