CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.1k stars 18.38k forks source link

Don't trust JHU data any more - there are better sources. #2417

Open traut21 opened 4 years ago

traut21 commented 4 years ago

Sorry to say that - but JHU people do not seem to mind about the many complaints and issues here.

Maybe JHU is still valuable for US states.

But otherwise you can see the trends e.g. at https://who.maps.arcgis.com/apps/opsdashboard/index.html#/ead3c6475654481ca51c248d52ab9c61 and get more pessimistic, but more reliable data for Europe and many other countries from https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

I feel it's unacceptable to ignore the complaints about the absurd French numbers. A direct comparison does show exaggerations for many other countries, too (e.g. Spain, Germany).

JHU does heavily rely on the guesses of worldometer. Those were good guesses before. Worldometer did a good job to adjust their estimations with real data. But their numbers are fictional guesses, while most countries nowadays do provide real numbers - even if delayed by some more hours than worldometer.

iborko commented 4 years ago

As I'm mostly interested in European countries, ECDC link seems like a great data source. Thanks!

lojic commented 4 years ago

I've reported errors, and JHU has fixed the errors, so I think your assertion that they don't care is unfounded.

traut21 commented 4 years ago

Lucky you - but a quick glance at today's values show 164589 for France, while the official count is 128339. There are more than 30 issues concerning France within the last weeks - and never an explanation or fix for this problem. And that's just France.

I checked other countries and found major discrepancies there too. But I do not mind to report those problems any more, as long as their is no kind of feedback culture from JHU - to acknowledge that there is a problem and maybe even suggest how they plan to fix it.

lojic commented 4 years ago

Worldometer has 165,842 cases for France for Apr 27. Do you have a link to the official count?

Worldometer also has a paragraph about France here: https://www.worldometers.info/coronavirus/about/

traut21 commented 4 years ago

worldometer even has two sections concerning France, with overlapping information. In fact, France did include EHPAD from 1. April on.

Official numbers are on https://www.santepubliquefrance.fr/maladies-et-traumatismes/maladies-et-infections-respiratoires/infection-a-coronavirus/articles/infection-au-nouveau-coronavirus-sars-cov-2-covid-19-france-et-monde - stating 128 339 cases for 2020-04-27_14:00.

The French dashboard is on https://dashboard.covid19.data.gouv.fr/ - stating 128 339 cas confirmé, including 30 227 from EHPAD and EMS.

Worldometer just claims 165 842, JHU just names 166 036 on https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 - I'm surprised, usually JHU showed the numbers from worldometer. The official overseas cases sum up to 98 - so this does not explain the difference here.

For the deaths; official are 23 293 (14 497 + 8 796), JHU 23 293 (!, 23 327 including overseas), worldometer 23 293.

From https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/04-26-2020.csv you get France 2020-04-27 02:30:33 confirmed 160 847
deaths 22 856

With official numbers for 2020-04-26: cases 124 575, deaths 22 856 from https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/

ladris commented 4 years ago

Why the continuous assertiom that JHU has ever used World of Meters data? WoM started using JHUs repo and then collected their own guesswork sources, whereas JHU uses reports and sources of its own. And yes, it will me more streamlined for US data because they have more access to their home countries data.

traut21 commented 4 years ago

Why do you believe hat JHU does not use worldometer? JHU themselves name: Data sources: WHO, CDC, ECDC, NHC, DXY, 1point3acres, Worldometers.info, BNO, the COVID Tracking Project...

traut21 commented 4 years ago

Or do read https://github.com/CSSEGISandData/COVID-19/blob/master/README.md which names Data Sources:... WorldoMeters: https://www.worldometers.info/coronavirus/ and many others

lojic commented 4 years ago

@traut21 I may have spoken too soon. I opened two issues with lots of detail re: the specific line in a specific file that had the error, links to the health department websites showing the correct values, etc. Both issues were totally ignored today, and the next daily update still has the erroneous values. I just happened to catch these two errors, so I must assume there are many more undetected. I expect JHU is simply overwhelmed, and can no longer vet the data properly. I may need to create my own scraper for US county data.

One disturbing aspect is that both of the errors should have been automatically detected, so I can't imagine what their web scraping code looks like :-o

Themroc commented 4 years ago

Sorry to say that - but JHU people do not seem to mind about the many complaints and issues here.

Just look at the closed issues:

2439 by asundaratx was closed 12 hours ago

prichterich commented 4 years ago

It seems that the JHU people have answered asundaratx about his issue, and explained to him where the cases for Austin are reported. The organization of the data in general is by county, so the data are listed in the county Austin is in. The person who opened the issue wrote "Thanks for clearing this up." and closed it.

Regarding the French numbers, it is obvious that France is one of the worst countries in the world with respect to testing. Even with the higher JHU numbers, the raw CFR for France is substantially higher than for Italy and Spain. It seems the Worldometer numbers and the JHU numbers include numbers for "probable" cases that the French health department has published from time to time. Such "probable" cases often include actual deaths in nursing homes that were clearly from COVID-19, but where no test was done. Similarly, Worldometer includes "probable" COVID-19 deaths reported by NYC (and possibly other places). Again, probable death means no positive COVID-19 test, but a high likelihood that the death was caused by COVID-19, in the professional opinion of the medical examiner. Worldometer then also adds the corresponding number of cases, which makes perfect sense.

Worldometers numbers are based on publicly available information from multiple sources. The "official" numbers reported by governments often reflect the desire of the government to keep reported numbers as low as possible. That even applies to the CDC numbers in the US. The CDC has openly admitted that the numbers it reports are low. It's getting harder to find this disclaimer now, but the CDC still states that " data reported by states should be considered the most up to date".

traut21 commented 4 years ago

Yes, they added a little bit of explanation for France. I don't know about Brazil, I don't know about USA. But I doubt that Germany is considered as poor testing - and the JHU numbers are several thousands higher than the official numbers here.

The data series here is named "confirmed" cases. I don't know where those "probable" cases come from - but that's at best a "guessed" count. They do not explain where those numbers come from. They are sometimes very good guesses. But I don't know why I should trust them any more than the official numbers from France. France is not North Korea, nor China, nor Brazil.

lojic commented 4 years ago

@prichterich I was the one who cleared up the confusion about Austin, TX on #2428. The JHU people have been silent.

mcroebuck commented 4 years ago

@lojic I'm working with the county-level data. My Stata code pulls in the new files whenever I run it. So how concerned should we be about fidelity? I'm also pulling in Testing data at the state level. There really isn't a better source for US county level, right? So far my only complaint have been the lumping together of all New York City counties/boroughs into 1. Thanks!

lojic commented 4 years ago

@mcroebuck re: how concerned you should be about fidelity, it depends a little on how you're using the data. If you're mainly interested in the latest results, JHU does seem to correct itself (eventually). In my case, I'm tracking changes in new daily deaths/cases, so when JHU provided an erroneous value for Lincoln County MO of 41 deaths (vs. the actual 1 death) on Apr 27, then that county shot up to the top of my "worst counties" list for one day at http://lojic.com/covid.html which is how I noticed the error.

Regarding better sources, I'm considering scraping county health department websites directly. Here is North Carolina as an example. I will likely only do that for my home state of NC and use JHU for the total county list.