covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Death and recovered data issue #422

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/823, transferred here on Wednesday Apr 15, 2020 at 17:47 GMT


US - New York , Death and recovered cases has null values

Eg: Number of death cases are high in New York but the reported here is zero

Issue details

Data correction for death and recovered cases

jzohrab commented 4 years ago

(Transferred comment)

What file?

jzohrab commented 4 years ago

(Transferred comment)

timeseries.csv has no death data for New York state. The last relevant line is:

"New York, United States",state,,,New York,United States,19453561,42.762,-75.809,https://health.data.ny.gov/api/views/xdss-u53e/rows.csv?accessType=DOWNLOAD,county,America/New_York,213779,,,,526012,,,,2020-04-15
jzohrab commented 4 years ago

(Transferred comment)

In the timeseries.csv file mainly for New york ( including all counties ) the total number of death and recovered case is reported as 0.

jzohrab commented 4 years ago

(Transferred comment)

Deaths by borough isn't included in the data source currently used to retrieve cases and tested. It sounds like it's handled by a different department, hence published separately. :woman_shrugging:

They are is however included in daily PDFs available here: https://www1.nyc.gov/site/doh/covid/covid-19-data.page#download

https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-deaths-04182020-1.pdf

The result from pdftotext looks pretty reasonable!

Confirmed

Probable

- Bronx

1917 (22.7%)

457 (19.1%)

- Brooklyn

2490 (29.5%)

801 (33.4%)

- Manhattan

1088 (12.9%)

343 (14.3%)

- Queens

2543 (30.1%)

712 (29.7%)

406 (4.8%)

75 (3.1%)

4 (0%)

10 (0.4%)

Staten Island and Unknown don't make the cut but they're the last 4 values.

jzohrab commented 4 years ago

(Transferred comment)

I stumbled upon https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Fatalities?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n, showing fatalities per NY county.

NY fatality data is perhaps the most interesting data out there right now, as it offers a lower bound of what everyone else can expect. I would love to have it sourced via coronadatascrapper. [Edit:] Alas, it's a Tableau viz, with no obvious scrapping / download capability.