CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.14k stars 18.43k forks source link

NYC COVID data for 4/23 looks incorrect #2380

Open MNRKCM opened 4 years ago

MNRKCM commented 4 years ago

Hi – the NYC data for 4/23 confirmed COVID cases looks inccorrect –

4/22 = 147297
4/23 = 145855 4/24 = 150473

The level of confirmed cases cannot go negative, ie, it can’t decrease. There can be zero new cases (no change in the level) – but it can’t go lower.

Is there a way to correct this file?

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv

econ-erik commented 4 years ago

Hey @MNRKCM, with the speed at which reporting on these data evolve, it's entirely possible that "probable" cases, i.e., before confirmed laboratory tests, turn out to be negative. For instance, the NYTimes series for NYC has 4/22 at 142,442, 4/23 at 145,855, and 4/24 is not in the their github yet (they're being a bit more patient and only counting lab confirmed cases even though some states have adopted the CDC's move to also include probable). It's certainly valuable to puzzle over the negative changes but they may not necessarily be wrong because of their sign. Bear in mind that all of these numbers are massive under counts of the true actual cases anyway.

MNRKCM commented 4 years ago

Thanks for your reply. I appreciate it. But I disagree. JHU is calling this confirmed cases (not confirmed + probable). The number of confirmed cases cannot go down. For example, just using made up numbers, NYC could have 100 confirmed COVID cases today. But it can’t have 90 confirmed cases tomorrow. It could still have 100 confirmed cases tomorrow, ie, no new cases, but it can’t have a decrease in the level. Confirmed cases don't go away. So there's a discrepancy that calls into question the accuracy of the data.

econ-erik commented 4 years ago

They can call it what they like but their definition of confirmed cases includes probable for a number of countries. Canada is a great example as earlier in April they followed the CDC and started reporting "total cases" as "confirmed + probable" and JHU is bringing in that "total cases" measure as "confirmed cases". Laboratory tests won't necessarily line up with those probable cases, and hence "total cases" may fall in some locations. For instance, last night the GoC had Ontario with 13,519 "confirmed cases", but JHU has 14,550. That higher number also includes Ontario's "probable cases". I think it comes down to the number one rule of data science -- know your data (which includes users just as much as producers/compilers).

Again, let me stress that all we're observing here is a noisy signal of true cases. Confirmed cases has the least time-lag (compared to say deaths or active cases, which are likely measured with less uncertainty) so it's more useful when trying to track the outbreak in real-time. The NYT github has a great readme going over their process for tracking cases in the US if you want to read more.

MNRKCM commented 4 years ago

What's the least noisy data set for tracking daily NYC/ NY/US data -- confirmed cases? Is NYT now collecting the data themselves? Are they better than JHU? At one point they were referencing JHU as their source.

econ-erik commented 4 years ago

At the county-level the NYT is likely the best tracked data, since they're pulling in the data straight from local public health authorities and (for now -- their readme mentions they're testing including probable cases -- they only include laboratory confirmed cases). If you look at JHU's errata for the US, the adjustments are often made in response to the NYT's data. The caveats to using the NYT data is that it's slower. As of 12:25 EDT, their 4/24/2020 update has not been posted to github. The ECDC is another option but again it has similar caveats in that it will be behind the JHU data each day because they report much earlier.

codewarrior2000 commented 4 years ago

@MNRKCM New York City government reports it's data at https://www1.nyc.gov/site/doh/covid/covid-19-data-archive.page, where for those three days the confirmed cases are given as:

4/22 = 141754 4/23 = 146139 4/24 = 150576