Open ryantibs opened 3 years ago
how do we know which zero values are real and which ones are due to some data bug?
@sgratzl Finally getting around to this now ... when we synced about v3 on slack you mentioned this as an outstanding issue but this shouldn't hold up us in releasing v3 at all. This is really a separate issue completely then the dashboard redesign. So we should just leave this open revisit it with @krivard when she gets back from her week off (this week) but move on for now.
Recording thoughts below for later.
Checking the dashboard just now, we seem to be missing data for most counties in Utah (first screenshot). However Utah's own state dashboard (second screenshot) shows that there is still case activity being reported throughout the state. And I looked but I didn't Utah mentioned in the most recent weekly email from JHU (they send one to the forecasting community weekly). So again I don't know if this is just a JHU issue or an issue on our side or what.
JHU's README was updated on October 26 to say that Nebraska is only reporting state cases, not county cases. However, the emails from Jeremy suggest that they've been able to pull county cases from one of the CDC trackers instead. I'll look into this more and get back to you.
Re: Utah:
The JHU file has 37 entries for Utah. Of those, 11 are counties that have never reported anything other than 0:
FIPS | Lat | Long | Name |
---|---|---|---|
49003 | 41.52106798 | -113.08328159999999 | "Box Elder, Utah, US" |
49009 | 40.88798265 | -109.51210929999999 | "Daggett, Utah, US" |
49023 | 39.70208397 | -112.78092450000001 | "Juab, Utah, US" |
49027 | 39.0729209 | -113.1020328 | "Millard, Utah, US" |
49029 | 41.08830262 | -111.5727723 | "Morgan, Utah, US" |
49031 | 38.33815254 | -112.1249591 | "Piute, Utah, US" |
49033 | 41.63137678 | -111.2445105 | "Rich, Utah, US" |
49039 | 39.37231946 | -111.5758676 | "Sanpete, Utah, US" |
49041 | 38.74837146 | -111.8050275 | "Sevier, Utah, US" |
49055 | 38.32335822 | -110.9096801 | "Wayne, Utah, US" |
49057 | 41.27116049 | -111.9145117 | "Weber,Utah,US" |
Of the remaining 26 entries, 6 are about named regions that have no FIPS code:
Lat | Long | Name |
---|---|---|
41.52106798 | -113.08328159999999 | "Bear River, Utah, US" |
39.37231946 | -111.5758676 | "Central Utah, Utah, US" |
38.99617072 | -110.70139579999999 | "Southeast Utah, Utah, US" |
37.85447192 | -111.4418764 | "Southwest Utah, Utah, US" |
40.12491499 | -109.5174415 | "TriCounty, Utah, US" |
41.27116049 | -111.9145117 | "Weber-Morgan, Utah, US" |
If there's something we should do about this, best open a new issue for it so it doesn't get lost.
Details on Nebraska:
County case data drops out in covidcast starting 2021-06-01 and ending 2021-09-24.
County case data never drops out in the JHU file.
The most recent update to county-level case incidence for reference date 2021-06-01 was November 16 at 10pm. It contains all zeroes for Nebraska counties.
Next step is to run the jhu indicator on staging to see what's going on, however that package hasn't been updated since Nov 1. It should have been updated throughout the month of November, with the most recent change occurring on the 29th.
Next steps:
Correction: county case incidence absolutely does drop out in the JHU file. The only JHU entry with nonzero incidence between 2021-05-26 and 2021-09-24 in Nebraska is for Unassigned.
This is consistent with the most recent JHU-CSSE announcement about Nebraska, however it is confusing as they say there they will only be updating Unassigned, but there are clearly incident cases reported for many Nebraska counties between 2021-09-25 and now.
I've asked on the CSSE thread for clarification. They may not be able to fill in the gap; if so I can add a broad data anomaly flag to all the NE counties for that period.
Thanks a lot for following up on all of these. Is the 1 line summary for what has been happening---both in Nebraska and in Utah---that "we are showing what JHU is showing"?
For county counts, yes. JHU tracks unassigned and out-of-state counts as well, plus those weird non-FIPS regions in Utah. We don't currently have a way to show any of those.
Thanks. Maybe a good general thing to do will be do put in a warning when the county totals don't add up their parent state. (Allowing for some small error tolerance.). That will be a way to catch unassigned counts. And we can flag this on the dashboard whenever it happens. What do you think?
Maybe? to be feasible, the error tolerance might need to be larger than you'd think.
I took a recent copy of the JHU cases file, then summed the non-county counts (including non-county regions, unassigned, and out-of-state, none of which are shown in the choropleth) vs the county counts (which we do show in the choropleth) for each day for each state. Here's a heat map of the ratio of hidden:shown counts for each day for each state, rounded to the nearest tenth, with ratios larger than 1 truncated to 1. I dropped everything with a ratio of less than 10% since that seemed like a possibly-reasonable threshold for a small error tolerance.
Even if we only consider times when unassigned/out-of-state/not-a-county counts exceed the county counts (so ratio >=1), that's still makes for 759 individual warnings.
We do store the non-county counts in the API under the "megacounty" for JHU (since we don't need the megacounty for censoring), so it's plausible to make the client sum up the county counts and compare at render time, but I'm not sure what that would do to our load time. It would be faster to load if we tracked the warnings in the existing anomalies sheet, but maintaining 759 entries that are possibly continually changing is not something we can handle without more data entry staff. I've barely been keeping up with the existing list of ~100 or so anomalies currently in the sheet.
How do you want to proceed?
When I take a look at the choropleth map for COVID cases, it shows zeros for all Nebraska counties:
It's clear that Nebraska as a state has COVID cases being reported:
But something weird is happening with Nebraska's counties:
I checked the API directly (via the R package) to ensure that this isn't a viz problem; the API also has no county case data for Blaine NE (it's just all zeros recently).
Even if this data bug is "real" all the way back to JHU CSSE (meaning, it's not in JHU CSSE because for some reason Nebraska actually stopped reporting county cases), we should find a way to handle this gracefully in the viz tool so that we:
Similarly for most of Utah, assuming it's the same problem there.
Rating Scale (1 is minor, 7 is severe): 6