Closed cburkins closed 4 years ago
✅ Data scraped!
- 0 cities
- 1 states
- 67 counties
- 0 countries
ℹ️ Total counts (tracked cases, may contain duplicates):
- 1703 cases
- 16441 tested
- 0 recovered
- 32 deaths
- 0 active
can't reproduce locally?
Thanks for the quick response !
This is the PA,USA data object at https://coronadatascraper.com/#timeseries-byLocation.json
Which shows the discrepancy I'm describing. Is that helpful ?
Yeah I saw the same, that looks totally wrong...
I just tested this locally (yarn start --location "PA, USA"
), and logged the values the scraper was getting. Output:
{ county: 'Adams County', cases: 7, deaths: 0 }
{ county: 'Allegheny County', cases: 133, deaths: 2 }
{ county: 'Armstrong County', cases: 1, deaths: 0 }
... [snip] ...
{ county: 'Wayne County', cases: 6, deaths: 0 }
{ county: 'Westmoreland County', cases: 24, deaths: 0 }
{ county: 'York County', cases: 21, deaths: 0 }
Those numbers match the data currently shown on https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx.
The numbers also match the summary table at the top
Negative | Positive | Deaths |
---|---|---|
16,441 | 1,687 | 16 |
The numbers also match news headlines found with search "pennsylvania covid deaths".
In short, it looks ok to me, based on what's out there.
@cburkins: What were you expecting, what feels off about these numbers to you?
Interesting, they also have an Archive page which lists several dates, and matches the 6/7/11 numbers @cburkins pointed out in https://github.com/lazd/coronadatascraper/issues/409#issuecomment-604758785:
https://www.health.pa.gov/topics/disease/coronavirus/Pages/Archives.aspx.
Thanks all for looking at this. I think most of the PA county-level data is correct. It seems to be the roll-up to PA sum data that feels off.
PA data is out of whack again due to the page https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx now including some age ranges with percents, which are getting reported in the data:
Age Range | Percent of Cases |
---|---|
... | ... |
50-64 | 28% |
65+ | 18% |
data.json:
{
"state": "PA",
"country": "USA",
"aggregate": "county",
"url": "https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx",
"county": "65+ County",
"cases": 18,
"deaths": 18,
"rating": 0.47058823529411764
},
This is a change as of today to the page layout, I'll work on this now.
As a PA resident, many thanks to you!
On Mar 27, 2020, at 7:25 PM, JZ notifications@github.com wrote:
PA data is out of whack again due to the page https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx now including some age ranges with percents, which are getting reported in the data:
Age Range Percent of Cases ... ... 50-64 28% 65+ 18% data.json:
{ "state": "PA", "country": "USA", "aggregate": "county", "url": "https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx", "county": "65+ County", "cases": 18, "deaths": 18, "rating": 0.47058823529411764 }, This is a change as of today to the page layout, I'll work on this now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
@jzohrab @cburkins was this fixed by #452? Check the data we reported last night and close this issue if it's looking proper.
I'm not sure if this is the same issue but the "like JHU" data also appears to have figures that are too low for the past week or so.
,,PA,USA,41.12951166463159,-77.60961308037935,12801989,https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,16,,41,47,63,71,96,133,185,268,2,2,6,7,11,16,22
Hmm, didn't pull the repo and deploy myself, as I'm leveraging the data available on https://coronadatascraper.com/#timeseries-byLocation.json
Looking at that data, still shows incorrect values.... Perhaps it will be correct tomorrow when it pulls the new data for today ?
I fixed this last night in https://github.com/lazd/coronadatascraper/commit/392290719e7cb414f890b6a637fb29fa8327ba67, it looks good now
Agreed, PA data looks good now !
The PA (Pennsylvania) data appears to be incorrect now. It's now showing a very low number for number of cases (e.g. about 12). In clicking through to the data source for the scraper, I'm guessing that the scraper (or website) reversed the columns for number of cases and number of deaths.