Closed oh2fih closed 7 months ago
I was thinking of a rather simple solution, if you don't find it too quick and dirty:
Last-Modified
and Content-Length
headers are the same. No need for actually comparing the values even.True
for "the file probably has changes" & let the current logic try the downloads and catch any errors. The same approach can cover both internal errors with the cache file loads, writes & contents as well as external errors with the HTTP communications.Note to self:
This would be a nice addition! Regarding saving the header to file; we could save it to the info collection of the database as well; saves that hassle of dealing with files.
Thanks for the hint. Because the info
collection already had this information cached we didn't even need additional caches, but could simply compare the value in there. That made adding the new functionality rather straightforward. Please review the pull request.
After this I can simply lower the update interval on CVE-Searche's SystemD timer from 2 hours to 1 hour, which would also be the same as the sleep time in the db_updater.py -v -l
(loop mode). :+1:
The
nist.nvd_nist_api
part is now brilliant for updating regularly with minimal steps, just downloading the new CPEs & CVEs from the API. On the other hand, for other sources that are not providing an API we are still downloading an entire file every time the database is updated – even though the data does not change that often (X's are not modified since the last update
).Last-Modified
Content-Length
=>
location: epss_scores-2024-04-08.csv.gz
I'm suggesting saving the
Last-Modified
&Content-Length
and first asking just HTTPHEAD
of file (final destination of possible redirects; Python equivalent forcurl -I -L
). The update for that source should only start if either of these details has changed from the previous update. That would allow shortening the update intervals without increasing loads & traffic on the source servers.Any thoughts on this approach & how the cached headers should be saved?