Open kengo-sony opened 4 years ago
Well that's total garbage. Thanks @kengo-sony for the issue!
A request: if you're using timeseries-byLocation.json
, in the future when reporting data issues, please also include the dateSources
section of the file for the location in question ... it helps me determine where to look. :-) In this case, that section says "2020-04-15..2020-07-29": "us-covidtracking"
, so the us-covidtracking source is the one causing the trouble.
Cheers, looking into it! jz
Correction, it was actually "2020-03-21..2020-07-31": "us-ca-mercury-news"
.
Should be fixed in #366. I'll launch it to prod and the data should be regenerated in at most a few days.
Thanks again @kengo-sony ! jz
Thanks for your quick fix @jzohrab ! 6/26 data looks good now.
Yes, I will make sure to include the dateSources section when reporting a data issue.
Thanks again!
Sorry again, @jzohrab.
I see another issue in Alameda County. I could be an side effect of the fix. "cases" turns small from 6/24 suddenly, it looks daily cases number instead of cumulative number, and turned back to cumulative number on 8/1 .
"2020-06-22": {
"cases": 5007,
"deaths": 120,
"growthFactor": 1.04
},
"2020-06-23": {
"cases": 5140,
"deaths": 120,
"growthFactor": 1.03
},
"2020-06-24": {
"cases": 5,
"deaths": 122,
"growthFactor": 0
},
"2020-06-25": {
"cases": 5,
"deaths": 128,
"growthFactor": 1
},
...
"2020-07-31": {
"cases": 11,
"deaths": 182,
"growthFactor": 1
},
"2020-08-01": {
"cases": 11131,
"deaths": 182,
"growthFactor": 1011.91
}
},
Here is the data source.
"dateSources": {
"2020-01-24..2020-02-29": "jhu-usa",
"2020-03-01..2020-03-20": {
"jhu-usa": [
"deaths"
],
"nyt": [
"cases"
]
},
"2020-03-21..2020-07-31": "us-ca-mercury-news",
"2020-08-01": "jhu-usa"
},
I’m seeing something similar with case counts, except it doesn’t go back to being a cumulative number: #370. I’m also seeing more deaths than cases for the following California counties:
and more recoveries than cases for the following counties:
and no cases for the following counties that have had cases:
Tested looks good in the "California County Coronavirus Reporting" Google Spreadsheet maintained by Harriet Rowan but the data I'm fetching from https://coronadatascraper.com/timeseries.csv.zip is still broken for Contra Costa County. Do you think this is due to caching or remaining issues with parsing?
Here’s what timeseries-byLocation.json says for Contra Costa County in August:
"2020-08-01": {
"cases": 7806,
"deaths": 121,
"hospitalized_current": 106,
"tested": 135408,
"growthFactor": 1.02
},
"2020-08-02": {
"cases": 7966,
"deaths": 125,
"hospitalized_current": 107,
"tested": 136325,
"growthFactor": 1.02
},
"2020-08-03": {
"cases": 8033,
"deaths": 127,
"hospitalized_current": 100,
"tested": 136801,
"growthFactor": 1.01
},
"2020-08-04": {
"cases": 8176,
"deaths": 131,
"hospitalized_current": 101,
"tested": 137460,
"growthFactor": 1.02
},
"2020-08-05": {}
137,460 matches what the spreadsheet shows for August 4 in Contra Costa County. The empty object for August 5 might be because the spreadsheet already shows data for some counties on August 5. The scraper only avoids returning a result if no county has reported data on a certain date:
Apologies for what may have been a false alarm. I agree that cases for Contra Costa County now look good.
The COVID Atlas site still shows 15,500 deaths in Santa Clara County and similarly catastrophic spikes across the Bay Area on June 26, as originally reported above:
One solution is to stand up alternative scrapers that will be preferred over the Mercury News source, such as #375 for Santa Clara County, #378 for Alameda County, and #379 in Marin County.
As a followup to https://github.com/covidatlas/li/issues/363#issuecomment-669635453, San Mateo County and possibly others are showing an explicit 0 cases on recent days for which there’s no data, instead of undefined:
"2020-08-05": {
"cases": 5758,
"deaths": 120,
"hospitalized_current": 60,
"tested": 107268,
"icu_current": 15,
"growthFactor": 1
},
"2020-08-06": {
"cases": 0,
"deaths": 0
},
"2020-08-07": {
"cases": 0,
"deaths": 0
},
"2020-08-08": {
"cases": 0,
"deaths": 0
},
"2020-08-09": {
"cases": 0,
"deaths": 0
}
Yep I don't know why some are coded that way, it's incorrect. Thanks for catching it.
San Francisco County "2020-06-25": { "cases": 3297, "deaths": 48, "tested": 129617, "hospitalized_current": 47, "icu_current": 17, "growthFactor": 1.01 }, "2020-06-26": { "cases": 3400, "deaths": 4800, "tested": 132575, "hospitalized_current": 45, "icu_current": 18, "growthFactor": 1.03 }, "2020-06-27": { "cases": 3468, "deaths": 49, "tested": 135170, "hospitalized_current": 54, "icu_current": 19, "growthFactor": 1.02 },
Alameda County { "cases": 5382, "deaths": 128, "tested": 0, "hospitalized": 0, "recovered": 0, "icu": 0, "growthFactor": 1.02, "date": "2020-06-25" }, { "cases": 5493, "deaths": 13000, "tested": 0, "hospitalized": 0, "recovered": 0, "icu": 0, "growthFactor": 1.02, "date": "2020-06-26" }, { "cases": 5493, "deaths": 130, "tested": 0, "hospitalized": 0, "recovered": 0, "icu": 0, "growthFactor": 1, "date": "2020-06-27" },
Other counties shows wrong huge deaths number on 6/26 only.