Open debugger-zz opened 4 years ago
I think I've improved or fixed this, by adding a caching heuristic.
I think I've improved or fixed this, by adding a caching heuristic.
Does it use the last-modified tag from the server or does it cache it so that no request is even made?
I've added support in
scrape.py
to cache URLs fetched viascrape.request_url
-function.Sadly, HTTP requests are only saved for a limited number of URLs to
landkreise/data/.webcache/
.In case of problems this cache might have to be cleared!
In the end it would be good if it is used and filled by every call to request_url to reduce load on webpages.
If you start a scraper with debug output, you cannot see if the cache was hit:
I've started working on CacheControl to add debug output and fix the problem. If someone knows a better replace please speak up.
My patched file_cache.py with debug output: https://github.com/corona-zahlen-landkreis/corona_landkreis_fallzahlen_scraping/blob/master-anaylse-cachecontrol-bug/landkreise/file_cache.py