covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Calculate MD5 hash of each fetched page and ensure that the content has changed from day to day #388

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/159, transferred here on Thursday Mar 19, 2020 at 20:10 GMT


We should report if it has not been updated at all. This would catch errors like the NJ dataset changing URLs but leaving the old one accessible.

jzohrab commented 4 years ago

(Transferred comment)

Maybe filename should be: {md5 of url.substr(8)}-{md5 of contents.substr(8)}-{8601Z timestamp}.ext, wdyt?