Open alfkoehn opened 4 years ago
Note: I just realized that a few dates have indeed been changed (so you guys can all change your scripts fixing the wrong numbers), those are:
And why did we get 12509 new cases in the dataset since yesterday? There weren't that many. Poor France, can't get fixed once and for all :(
Same Pb with these data. How explain such a divergence with French and European data? This is a real problem because I can no longer refer to this source, which is usually very reliable. [1] https://github.com/opencovid19-fr/data/blob/master/ministere-sante/2020-04-16.yaml [2] https://dashboard.covid19.data.gouv.fr/
What can we do to fix it ? Best regards
Finally, I can post!
The France problem is related to the fact that they began (April 2) reporting (separately) cases in nursing homes from cases in hospitals. Wiki has the hospital only data. Worldometers has the sum of the two. Starting this week, France is testing all residents of any nursing home with at least one suspect covid death. Nursing home cases are now 34% of the total.
2 Apr 20 59105 14638 73743
3 Apr 20 64338 17827 82165
4 Apr 20 68605 21348 89953
5 Apr 20 70478 22361 92839
6 Apr 20 74390 23620 98010
7 Apr 20 78167 30902 109069
8 Apr 20 82048 30902 112950
9 Apr 20 86334 31415 117749
10 Apr 20 90676 34193 124869
11 Apr 20 93790 35864 129654
12 Apr 20 95403 37188 132591
13 Apr 20 98076 38703 136779
14 Apr 20 103573 39730 143303
15 Apr 20 106206 41657 147863
16 Apr 20 108847 56180 165027
Dear John,
Your work is great and I would like to continue et use your data.
However, I need stable data because I perform incidence rate
calculations to highlight the epidemic spread.
I do not understand well what you are going to do
Will you replace the aggregated data (Nursing home + hospital)
from the table in your email with the data currently available in
the wiki?
If yes or something equivalent, when will this be effective on
your wiki?
The new data will be implemented from the date of April 2, 2020?
if yes and if this decision is stable, I will develop locally a
model that will allow me to fit with the previous data to avoid
dropping out (I did the same thing with the Chinese data when they
modified their definition cases.
Thank you in advance for your response and for your very important
work.
Best regards,
Dr Laurent TOUBIANA, PhD. Physique, Epidémiologiste
Directeur de l'IRSAN, "Institut de recherche pour la
valorisation des données de santé"
Responsable du SCEPID : Systèmes Complexes et
Epidémiologie
Laboratoire d'Informatique Médicale et d'Ingénierie des
connaissances
INSERM UMRS 1142 LIMICS, Paris, F-75006;
UPMC : Université Pierre et Marie Curie - Paris 6
eMail : laurent.toubiana@inserm.fr - URL : L.
Toubiana sur Aviesan
Tél. : (33) 01 44 27 91 97
Adresse Postale :
Laurent Toubiana
Campus des Cordeliers
Esc. D - 2ème étage
15, rue de l'école de médecine
75006 Paris
Le 17/04/2020 à 16:51, JohnRideau a
écrit :
Finally, I can post!
The France problem is related to the fact that they began
(April 2) reporting (separately) cases in nursing homes from
cases in hospitals. Wiki has the hospital only data.
Worldometers has the sum of the two. Starting this week, France
is testing all residents of any nursing home with at least one
suspect covid death. Nursing home cases are now 34% of the
total.
2 Apr 20 59105 14638 73743
3 Apr 20 64338 17827 82165
4 Apr 20 68605 21348 89953
5 Apr 20 70478 22361 92839
6 Apr 20 74390 23620 98010
7 Apr 20 78167 30902 109069
8 Apr 20 82048 30902 112950
9 Apr 20 86334 31415 117749
10 Apr 20 90676 34193 124869
11 Apr 20 93790 35864 129654
12 Apr 20 95403 37188 132591
13 Apr 20 98076 38703 136779
14 Apr 20 103573 39730 143303
15 Apr 20 106206 41657 147863
16 Apr 20 108847 56180 165027
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
[
{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/CSSEGISandData/COVID-19/issues/2259#issuecomment-615288118", "url": "https://github.com/CSSEGISandData/COVID-19/issues/2259#issuecomment-615288118", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
@JohnRideau The France problem is related to the fact that they began (April 2) reporting (separately) cases in nursing homes from cases in hospitals. Wiki has the hospital only data.
That is not accurate. Wikipedia includes confirmed cases in nursing homes (20,272 as of today). It does not include probable cases though (38,717 as of today). Both Wikipedia and JHU include deaths in nursing homes. Source.
So the difference is that JHU CSSE includes all reported probable (but not confirmed) cases for France, while it includes confirmed only for other countries.
I'm not affiliated with Johns Hopkins, just another user who was puzzled by the data from France. For my analysis, I download the JH data every night, then run a FIX file that puts in corrections that I deem appropriate for my purposes. I just thought I would share what I learned about the data from France.
This from Reuters was the first article that helped me understand that nursing home numbers, previously unreported, were now being reported separately. I've decided to add the hospital cases to the nursing home cases for the France total in my work. I've noted that wiki has the hospital data while worldometers has the sum of the two. The JH data has fluctuated between the two measures, but I get French data directly from the government source now, overriding what JH has.
Mario, that is a very helpful clarification and a great link. Thank you.
Let me try to explain the problem with French data. We have the following figures from the source [https://dashboard.covid19.data.gouv.fr/] : "Confirmed Cases" : 108847 (16 Apr), 109252 (17 Apr), so daily variation is +405 "Confirmed Cases in nursing homes" : 18967 (16 Apr), 20272 (17 Apr), daily variation +1305 "Probable Cases in nursing homes" : 37213 (16 Apr), 38717 (17 Apr), daily variation +1504
"Confirmed Cases in nursing homes" are supposed to be included in "Confirmed Cases" but this is not stated clearly. When you see that the daily variation (16-17 Apr) is +1305 for "Confirmed Cases in nursing homes" vs +405 for the total "Confirmed Cases" (same situation for the 15-16 Apr variation), you conclude that the "Confirmed Cases in nursing homes" are not included in the "Confirmed Cases" and have to be added. Using or not the "Probable Cases" (symptoms but no test) is a matter of choice.
@FrancisWasserman "Confirmed Cases in nursing homes" are supposed to be included in "Confirmed Cases" but this is not stated clearly.
They are included in the total of confirmed cases and it is stated explicitly and very clearly. See the figure for cas confirmés en EHPAD et EMS, the tooltip says:
Nombre de cas confirmés par test PCR en EHPAD et EMS. Ce chiffre est inclus dans le nombre total de cas cumulés.
Translation:
Number of cases confirmed by PCR test in EHPAD and EMS. This figure is included in the total number of cumulative cases.
Thank you MarioGomWiki. The sentence you quote is precisely what I think is not clear. "This figure is included in the total number of confirmed cas" would have been clear. Now, suppose "confirmed cases in nursing homes" is included in "confirmed cases". If you substract "nursing homes" from "confirmed cases", you will see that the resulting series may decrease on some days, which is impossible for a cumulated series.
@FrancisWasserman You are assuming that cases do not move between categories in the breakdown, but they do. Just as in previous weeks, cases within the total have moved between categories as the reporting criteria was updated, it might be the same case for confirmed cases in EHPAD. You assume that the daily increment for confirmed cases in EHPAD is always due to cases not reported previously in the total, but they might be cases that were counted in the EHPAD total only following a hospital discharge. I have no evidence this is the case, but it is just an example of how your assumption is not the only explanation for the breakdown of values.
What we do know is that confirmed cases in EHPAD are clearly marked as being included in the total number of cases, as opposed to probable cases in EHPAD.
Thanks MarioGomWiki. You are right. There may have been some overlapping between the "confirmed cases" and the "nursing homes" series since Apr 4. That was the case with the number of deaths in nursing homes, started Apr 1. For some days, some nursing residents who died in an hospital were counted two times. The problem was immediately identified and exposed as the data were published and it was then quickly resolved. Nothing of this kind happened about the number of cases data.
The recent evolution of the numbers of confirmed and probable cases in nursing homes is fueled by a major effort to include all nursing homes in the stats (I do not know where we are today) and to tests all concerned residents. That may explain some inconsistencies in the data.
Number in database here: 145960
Number in other data sources [1,2]: 108847
37113 cases too much, probably due to also including potential cases (and not only confirmed cases), see e.g. the discussion [3] and an official statement [4], where it was promised to fix that (but nothing has happened since then).
[1] https://github.com/opencovid19-fr/data/blob/master/ministere-sante/2020-04-16.yaml [2] https://dashboard.covid19.data.gouv.fr/ [3] https://github.com/CSSEGISandData/COVID-19/issues/2005 [4] https://github.com/CSSEGISandData/COVID-19/issues/2094