python3 CacheControl's filecache not working

debugger-zz commented 4 years ago

I've added support in scrape.py to cache URLs fetched via scrape.request_url-function.

Sadly, HTTP requests are only saved for a limited number of URLs to landkreise/data/.webcache/.

In case of problems this cache might have to be cleared!

In the end it would be good if it is used and filled by every call to request_url to reduce load on webpages.

If you start a scraper with debug output, you cannot see if the cache was hit:

SCRAPER_DEBUG=yes get-somekreis.py

I've started working on CacheControl to add debug output and fix the problem. If someone knows a better replace please speak up.

debugger-zz commented 4 years ago

I think I've improved or fixed this, by adding a caching heuristic.

dadosch commented 4 years ago

I think I've improved or fixed this, by adding a caching heuristic.

Does it use the last-modified tag from the server or does it cache it so that no request is even made?

corona-zahlen-landkreis / corona_landkreis_fallzahlen_scraping