GoogleCloudPlatform / covid-19-open-data

Datasets of daily time-series data related to COVID-19 for over 20,000 distinct locations around the world.
Apache License 2.0
472 stars 131 forks source link

Dates of updates #505

Closed dugwood closed 2 years ago

dugwood commented 2 years ago

First of all: thanks for your great work.

My issue is the delay of updates, for example with vaccinations, we're currently the 21st of September, but vaccinations' data doesn't go beyond 9th of September.

Is there an issue? Or it's a normal process? If so, it would be helpful to fix Last-Modified header, as it gives a recent date:

https://storage.googleapis.com/covid19-open-data/v3/vaccinations.csv
Last-Modified: Tue, 21 Sep 2021 16:43:20 GMT

Perhaps the file should not be touched if it doesn't change?

owahltinez commented 2 years ago

Hi @dugwood, thanks for the kind words and for filing this issue.

The vaccination data is now up to date for all regions except for county locations in Brazil (which was the cause of the issue). I'm closing this issue but please feel free to continue the conversation if you have any questions or reopen it if you are still seeing stale data.

Unfortunately, because of how the data is processed at different "layers", it would be pretty difficult for us to detect if there's been no changes before publishing the vaccinations.csv file. We just grab whatever the latest data there is from all the data sources and aggregate it into a file.

I recommend not using the last modified metadata as a signal for anything other than to determine if the data pipelines are running.

dugwood commented 2 years ago

Thanks @owahltinez for the fix, that worked well.

Currently I'm using If-Modified-Since to avoid downloading data based on the Last-Modified. So that's a start, if you can't change it, that's fine by me. And I get the real dates by countries from the CSV.