covidatlas / coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sources.
https://coronadatascraper.com
BSD 2-Clause "Simplified" License
367 stars 180 forks source link

Deaths in Pennsylvania seem to have been updated with tested (?) counts. #1014

Closed roboton closed 4 years ago

roboton commented 4 years ago

image

This seems to have happened for several (maybe all?) counties in Pennsylvania. Also pasted below in case image doesn't work:

2020-05-15:Object cases:50 deaths:1 tested:3036 growthFactor:1 2020-05-16:Object cases:50 deaths:1 tested:3040 growthFactor:1 2020-05-17:Object cases:50 deaths:1 tested:3060 growthFactor:1 2020-05-18:Object cases:50 deaths:3015 tested:3065 growthFactor:1 state:"Pennsylvania" country:"United States" aggregate:"county" sources:Array[1] url:"https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx" county:"Montour County"

roboton commented 4 years ago

Another example:

image

roboton commented 4 years ago

The deaths data is on a different page: https://www.health.pa.gov/topics/disease/coronavirus/Pages/Death-Data.aspx

Than the cases data: https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx

delimmy commented 4 years ago

+1 seeing this issue too

jzohrab commented 4 years ago

Yep, the PA scraper code isn't nearly careful enough:

https://github.com/covidatlas/coronadatascraper/blob/master/src/shared/scrapers/US/PA/index.js#L279

It's assuming that the page structure doesn't change, which is no good. Thanks for the issue, @roboton and @edwlook .

jzohrab commented 4 years ago

It appears that this just started happening today:

date deaths
'2020-05-17' 4403
'-18' 4418
'-19' 277553

So that's nice, less lag in getting the fix in. Should be quick, I'll add that now.

jzohrab commented 4 years ago

Thanks again @roboton , I've merged a fix in for this. Please LMK if this resolves the issue. Cheers! jz

jzohrab commented 4 years ago

(whoops closed by accident)

roboton commented 4 years ago

Hm should it be updated on the main site? I still see really large death counts from Pennsylvania at this url: https://coronadatascraper.com/#timeseries-byLocation.json

[image: image.png] https://coronadatascraper.com/#timeseries-byLocation.json

Just wanted to say how awesome CDS is and grateful for what you all are doing.

On Tue, May 19, 2020 at 4:09 PM JZ notifications@github.com wrote:

(whoops closed by accident)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/covidatlas/coronadatascraper/issues/1014#issuecomment-631134309, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOLJSYDMPWCLEPIZDMS7STRSMGSRANCNFSM4NFHPCMA .

jzohrab commented 4 years ago

Right, I see a correction for the last run for the last date, but not the prior date. Here's from that URL, for Northampton again:

image

I just have to adjust the fix to be back one more day, and then we should be good. I hope! Fixing now, it should be resolved tomorrow.

jzohrab commented 4 years ago

(undo github auto-close) Cheers and thanks for following up!

roboton commented 4 years ago

Looks good! Thanks for fixing @jzohrab