CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

France: still wrong numbers of confirmed cases (April 16) + some of the old wrong numbers were fixed last night #2259

Open alfkoehn opened 4 years ago

alfkoehn commented 4 years ago

Number in database here: 145960

Number in other data sources [1,2]: 108847

37113 cases too much, probably due to also including potential cases (and not only confirmed cases), see e.g. the discussion [3] and an official statement [4], where it was promised to fix that (but nothing has happened since then).

[1] https://github.com/opencovid19-fr/data/blob/master/ministere-sante/2020-04-16.yaml [2] https://dashboard.covid19.data.gouv.fr/ [3] https://github.com/CSSEGISandData/COVID-19/issues/2005 [4] https://github.com/CSSEGISandData/COVID-19/issues/2094

alfkoehn commented 4 years ago

Note: I just realized that a few dates have indeed been changed (so you guys can all change your scripts fixing the wrong numbers), those are:

WildH0g commented 4 years ago

And why did we get 12509 new cases in the dataset since yesterday? There weren't that many. Poor France, can't get fixed once and for all :(

tube75 commented 4 years ago

Same Pb with these data. How explain such a divergence with French and European data? This is a real problem because I can no longer refer to this source, which is usually very reliable. [1] https://github.com/opencovid19-fr/data/blob/master/ministere-sante/2020-04-16.yaml [2] https://dashboard.covid19.data.gouv.fr/

What can we do to fix it ? Best regards

JohnRideau commented 4 years ago

Finally, I can post!

The France problem is related to the fact that they began (April 2) reporting (separately) cases in nursing homes from cases in hospitals. Wiki has the hospital only data. Worldometers has the sum of the two. Starting this week, France is testing all residents of any nursing home with at least one suspect covid death. Nursing home cases are now 34% of the total.

2 Apr 20 59105 14638 73743

3 Apr 20 64338 17827 82165

4 Apr 20 68605 21348 89953

5 Apr 20 70478 22361 92839

6 Apr 20 74390 23620 98010

7 Apr 20 78167 30902 109069

8 Apr 20 82048 30902 112950

9 Apr 20 86334 31415 117749

10 Apr 20 90676 34193 124869

11 Apr 20 93790 35864 129654

12 Apr 20 95403 37188 132591

13 Apr 20 98076 38703 136779

14 Apr 20 103573 39730 143303

15 Apr 20 106206 41657 147863

16 Apr 20 108847 56180 165027

tube75 commented 4 years ago
Dear John,

  Your work is great and I would like to continue et use your data.

  However, I need stable data because I perform incidence rate
  calculations to highlight the epidemic spread.

  I do not understand well what you are going to do 

  Will you replace the aggregated data (Nursing home + hospital)
  from the table in your email with the data currently available in
  the wiki?
  If yes or something equivalent, when will this be effective on
  your wiki?

  The new data will be implemented from the date of April 2, 2020?

  if yes and if this decision is stable, I will develop locally a
  model that will allow me to fit with the previous data to avoid
  dropping out (I did the same thing with the Chinese data when they
  modified their definition cases.

  Thank you in advance for your response and for your very important
  work.

  Best regards,

      Dr Laurent TOUBIANA, PhD. Physique, Epidémiologiste

      Directeur de l'IRSAN, "Institut de recherche pour la
        valorisation des données de santé"
      Responsable du SCEPID : Systèmes Complexes et
        Epidémiologie
      Laboratoire d'Informatique Médicale et d'Ingénierie des
          connaissances
      INSERM UMRS 1142 LIMICS, Paris, F-75006;
      UPMC : Université Pierre et Marie Curie - Paris 6
      eMail : laurent.toubiana@inserm.fr - URL : L.
        Toubiana sur Aviesan 
      Tél. : (33) 01 44 27 91 97

      Adresse Postale :
      Laurent Toubiana
      Campus des Cordeliers
      Esc. D - 2ème étage
      15, rue de l'école de médecine
      75006 Paris 

Le 17/04/2020 à 16:51, JohnRideau a
  écrit :

  Finally, I can post!
  The France problem is related to the fact that they began
    (April 2) reporting (separately) cases in nursing homes from
    cases in hospitals. Wiki has the hospital only data.
    Worldometers has the sum of the two. Starting this week, France
    is testing all residents of any nursing home with at least one
    suspect covid death. Nursing home cases are now 34% of the
    total.
  2 Apr 20 59105 14638 73743
  3 Apr 20 64338 17827 82165
  4 Apr 20 68605 21348 89953
  5 Apr 20 70478 22361 92839
  6 Apr 20 74390 23620 98010
  7 Apr 20 78167 30902 109069
  8 Apr 20 82048 30902 112950
  9 Apr 20 86334 31415 117749
  10 Apr 20 90676 34193 124869
  11 Apr 20 93790 35864 129654
  12 Apr 20 95403 37188 132591
  13 Apr 20 98076 38703 136779
  14 Apr 20 103573 39730 143303
  15 Apr 20 106206 41657 147863
  16 Apr 20 108847 56180 165027
  —
    You are receiving this because you commented.
    Reply to this email directly, view it on GitHub, or unsubscribe.
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/CSSEGISandData/COVID-19/issues/2259#issuecomment-615288118", "url": "https://github.com/CSSEGISandData/COVID-19/issues/2259#issuecomment-615288118", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

MarioGomWiki commented 4 years ago

@JohnRideau The France problem is related to the fact that they began (April 2) reporting (separately) cases in nursing homes from cases in hospitals. Wiki has the hospital only data.

That is not accurate. Wikipedia includes confirmed cases in nursing homes (20,272 as of today). It does not include probable cases though (38,717 as of today). Both Wikipedia and JHU include deaths in nursing homes. Source.

So the difference is that JHU CSSE includes all reported probable (but not confirmed) cases for France, while it includes confirmed only for other countries.

JohnRideau commented 4 years ago

I'm not affiliated with Johns Hopkins, just another user who was puzzled by the data from France. For my analysis, I download the JH data every night, then run a FIX file that puts in corrections that I deem appropriate for my purposes. I just thought I would share what I learned about the data from France.

This from Reuters was the first article that helped me understand that nursing home numbers, previously unreported, were now being reported separately. I've decided to add the hospital cases to the nursing home cases for the France total in my work. I've noted that wiki has the hospital data while worldometers has the sum of the two. The JH data has fluctuated between the two measures, but I get French data directly from the government source now, overriding what JH has.

https://www.reuters.com/article/us-health-coronavirus-france-toll/french-coronavirus-death-toll-hits-new-high-as-nursing-home-tally-swells-idUSKBN21M0S9

JohnRideau commented 4 years ago

Mario, that is a very helpful clarification and a great link. Thank you.

FrancisWasserman commented 4 years ago

Let me try to explain the problem with French data. We have the following figures from the source [https://dashboard.covid19.data.gouv.fr/] : "Confirmed Cases" : 108847 (16 Apr), 109252 (17 Apr), so daily variation is +405 "Confirmed Cases in nursing homes" : 18967 (16 Apr), 20272 (17 Apr), daily variation +1305 "Probable Cases in nursing homes" : 37213 (16 Apr), 38717 (17 Apr), daily variation +1504

"Confirmed Cases in nursing homes" are supposed to be included in "Confirmed Cases" but this is not stated clearly. When you see that the daily variation (16-17 Apr) is +1305 for "Confirmed Cases in nursing homes" vs +405 for the total "Confirmed Cases" (same situation for the 15-16 Apr variation), you conclude that the "Confirmed Cases in nursing homes" are not included in the "Confirmed Cases" and have to be added. Using or not the "Probable Cases" (symptoms but no test) is a matter of choice.

MarioGomWiki commented 4 years ago

@FrancisWasserman "Confirmed Cases in nursing homes" are supposed to be included in "Confirmed Cases" but this is not stated clearly.

They are included in the total of confirmed cases and it is stated explicitly and very clearly. See the figure for cas confirmés en EHPAD et EMS, the tooltip says:

Nombre de cas confirmés par test PCR en EHPAD et EMS. Ce chiffre est inclus dans le nombre total de cas cumulés.

Translation:

Number of cases confirmed by PCR test in EHPAD and EMS. This figure is included in the total number of cumulative cases.

FrancisWasserman commented 4 years ago

Thank you MarioGomWiki. The sentence you quote is precisely what I think is not clear. "This figure is included in the total number of confirmed cas" would have been clear. Now, suppose "confirmed cases in nursing homes" is included in "confirmed cases". If you substract "nursing homes" from "confirmed cases", you will see that the resulting series may decrease on some days, which is impossible for a cumulated series.

MarioGomWiki commented 4 years ago

@FrancisWasserman You are assuming that cases do not move between categories in the breakdown, but they do. Just as in previous weeks, cases within the total have moved between categories as the reporting criteria was updated, it might be the same case for confirmed cases in EHPAD. You assume that the daily increment for confirmed cases in EHPAD is always due to cases not reported previously in the total, but they might be cases that were counted in the EHPAD total only following a hospital discharge. I have no evidence this is the case, but it is just an example of how your assumption is not the only explanation for the breakdown of values.

What we do know is that confirmed cases in EHPAD are clearly marked as being included in the total number of cases, as opposed to probable cases in EHPAD.

FrancisWasserman commented 4 years ago

Thanks MarioGomWiki. You are right. There may have been some overlapping between the "confirmed cases" and the "nursing homes" series since Apr 4. That was the case with the number of deaths in nursing homes, started Apr 1. For some days, some nursing residents who died in an hospital were counted two times. The problem was immediately identified and exposed as the data were published and it was then quickly resolved. Nothing of this kind happened about the number of cases data.

The recent evolution of the numbers of confirmed and probable cases in nursing homes is fueled by a major effort to include all nursing homes in the stats (I do not know where we are today) and to tests all concerned residents. That may explain some inconsistencies in the data.