CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.13k stars 18.43k forks source link

The "confirmed" column in the daily data files is actually giving confirmed + probable for Puerto Rico #2900

Closed sacundim closed 4 years ago

sacundim commented 4 years ago

The "confirmed" column in the daily data files is actually giving confirmed + probable for Puerto Rico, as can be determined by comparing them to daily reports from Puerto Rico's Department of Health. Here's for example your July 19th file:

It says 12,063 for Puerto Rico under "confirmed":

Screen Shot 2020-07-20 at 12 25 30 PM

But the Puerto Rico Department of Health daily PDF reports are available here:

And the July 19th report (which says "data until July 18" in the front page, a common source of confusion) says the number of confirmed cases was 3,791:

Screen Shot 2020-07-20 at 12 34 48 PM

As I've highlighted, 12,063 is the sum of confirmed + probable cases (which in Puerto Rico really means molecular test vs. antibody test).

I've looked also at time series files for municipalities of Puerto Rico and they're systematically making the same error.

CSSEGISandData commented 4 years ago

@sacundim The is described in the field descriptions in Readme in the data directory:

Confirmed: Confirmed cases include presumptive positive cases and probable cases, in accordance with CDC guidelines as of April 14.

sacundim commented 4 years ago

@CSSEGISandData I don't believe you're reading that April 14 guideline correctly. If it's the one midway through that link, it reads:

Confirmed & Probable Counts

As of April 14, 2020, CDC case counts and death counts include both confirmed and probable cases and deaths. This change was made to reflect an interim COVID-19 position statement issued by the Council for State and Territorial Epidemiologists on April 5, 2020. The position statement included a case definition and made COVID-19 a nationally notifiable disease. Nationally notifiable disease cases are voluntarily reported to CDC by jurisdictions.

A confirmed case or death is defined by meeting confirmatory laboratory evidence for COVID-19.

A probable case or death is defined by one of the following:

  • Meeting clinical criteria AND epidemiologic evidence with no confirmatory laboratory testing performed for COVID-19
  • Meeting presumptive laboratory evidence AND either clinical criteria OR epidemiologic evidence
  • Meeting vital records criteria with no confirmatory laboratory testing performed for COVID19

Not all jurisdictions report probable cases and deaths to CDC. When not available to CDC, it is noted as N/A.

Nowhere in there does it say that the sum of the confirmed and probable case counts should be labeled "Confirmed" as you do. And if we look at the CDC's own "Cases in the U.S." page that your README links when it refers to that same policy in regard to deaths, the way the CDC actually reports Puerto Rico's numbers is not what you claim is their policy. They do "include both confirmed and probable cases," but by that they mean they the list both as separate figures that they label "Confirmed Cases" and "Probable Cases." In addition to that they include the sum of those, but that one they label "Total Cases":

Screen Shot 2020-07-20 at 11 43 17 PM

So I don't believe the policy that you refer to in fact supports your practice.