CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.11k stars 18.39k forks source link

Zeros and missing data #1146

Open dhmacq opened 4 years ago

dhmacq commented 4 years ago

As noted in quite a few other questions, US county and city level data was discontinued starting 2020-03-10 and replaced by state level data. (in California, at least; I haven't looked at other states; possibly in other countries as well?)

However, the county and city locations report zero for dates since then. These are not true zeros, these are missing data. NULLs in relational database parlance.

Worse yet, now that we know that zeros at least sometimes represent missing data, what about other zeros, especially earlier ones? How do we know whether a zero is a true zero, or indicates missing data?

Please, could a missing data code be introduced? It is essential to distinguish between real data and missing data.

For my own applications, simply omitting the zeros between commas in the CSV files would work, or a code such as -9999 could be used instead of zero.

PeterBloomingdale commented 4 years ago

This is an issue. Would be nice to update the dataset for cases in US prior to 3/10