CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.11k stars 18.39k forks source link

Which is the "authoritative" data between "daily_reports" and "time_series" #1509

Closed cipriancraciun closed 4 years ago

cipriancraciun commented 4 years ago

(I this issue by "authoritative" I don't mean "official", but in instead "the main data file" used by JHU for building any other derived files.)

Looking at the latest versions of time_series, I see that only the deaths were updated, not the confirmed, meanwhile the daily_reports are properly updated.

Therefore I am asking (given that you know the workflow of collecting and reporting data), which file is actually the "authoritative" source, based on which the other is created?

Should I assume that daily_reports is the authoritative one?

peterdrier commented 4 years ago

Time series is a pivot table across the daily report data.. So the daily report's is the more accurate one. Though the cities/countries/... have changed over time in how they were reported, so please don't complain about that, as a few hundred issues are already wastefully registered in that regard.

cipriancraciun commented 4 years ago

I thought so; I'll adapt my scripts. Thanks.

I won't complain about country names, as I'm already using a "dictionary" with alternative spellings, names, translations, etc., and where needed I manually added the values that appear in the JHU dataset. For reference:

(If needed a similar approach can be made for counties / provinces / cities.)