cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Zero COVID cases in Nebraska counties #1384

Open ryantibs opened 2 years ago

ryantibs commented 2 years ago

When I take a look at the choropleth map for COVID cases, it shows zeros for all Nebraska counties:

Screen Shot 2021-09-26 at 4 47 04 PM

It's clear that Nebraska as a state has COVID cases being reported:

But something weird is happening with Nebraska's counties:

Screen Shot 2021-09-26 at 8 46 00 PM

I checked the API directly (via the R package) to ensure that this isn't a viz problem; the API also has no county case data for Blaine NE (it's just all zeros recently).

Even if this data bug is "real" all the way back to JHU CSSE (meaning, it's not in JHU CSSE because for some reason Nebraska actually stopped reporting county cases), we should find a way to handle this gracefully in the viz tool so that we:

  1. color Nebraska as a state as a whole according to the state case count in the choropleth map, and somehow make it clear this is what we're doing the hover tip; or
  2. stop reporting zeros for these counties and make it clear in the hover tip that this is a missing data problem.

Similarly for most of Utah, assuming it's the same problem there.

Rating Scale (1 is minor, 7 is severe): 6

sgratzl commented 2 years ago

how do we know which zero values are real and which ones are due to some data bug?

ryantibs commented 2 years ago

@sgratzl Finally getting around to this now ... when we synced about v3 on slack you mentioned this as an outstanding issue but this shouldn't hold up us in releasing v3 at all. This is really a separate issue completely then the dashboard redesign. So we should just leave this open revisit it with @krivard when she gets back from her week off (this week) but move on for now.

Recording thoughts below for later.

Checking the dashboard just now, we seem to be missing data for most counties in Utah (first screenshot). However Utah's own state dashboard (second screenshot) shows that there is still case activity being reported throughout the state. And I looked but I didn't Utah mentioned in the most recent weekly email from JHU (they send one to the forecasting community weekly). So again I don't know if this is just a JHU issue or an issue on our side or what.

Screen Shot 2021-11-22 at 9 47 00 PM Screen Shot 2021-11-22 at 9 46 41 PM
krivard commented 2 years ago

JHU's README was updated on October 26 to say that Nebraska is only reporting state cases, not county cases. However, the emails from Jeremy suggest that they've been able to pull county cases from one of the CDC trackers instead. I'll look into this more and get back to you.

krivard commented 2 years ago

Re: Utah:

The JHU file has 37 entries for Utah. Of those, 11 are counties that have never reported anything other than 0:

FIPS Lat Long Name
49003 41.52106798 -113.08328159999999 "Box Elder, Utah, US"
49009 40.88798265 -109.51210929999999 "Daggett, Utah, US"
49023 39.70208397 -112.78092450000001 "Juab, Utah, US"
49027 39.0729209 -113.1020328 "Millard, Utah, US"
49029 41.08830262 -111.5727723 "Morgan, Utah, US"
49031 38.33815254 -112.1249591 "Piute, Utah, US"
49033 41.63137678 -111.2445105 "Rich, Utah, US"
49039 39.37231946 -111.5758676 "Sanpete, Utah, US"
49041 38.74837146 -111.8050275 "Sevier, Utah, US"
49055 38.32335822 -110.9096801 "Wayne, Utah, US"
49057 41.27116049 -111.9145117 "Weber,Utah,US"

Of the remaining 26 entries, 6 are about named regions that have no FIPS code:

Lat Long Name
41.52106798 -113.08328159999999 "Bear River, Utah, US"
39.37231946 -111.5758676 "Central Utah, Utah, US"
38.99617072 -110.70139579999999 "Southeast Utah, Utah, US"
37.85447192 -111.4418764 "Southwest Utah, Utah, US"
40.12491499 -109.5174415 "TriCounty, Utah, US"
41.27116049 -111.9145117 "Weber-Morgan, Utah, US"

If there's something we should do about this, best open a new issue for it so it doesn't get lost.

krivard commented 2 years ago

Details on Nebraska:

County case data drops out in covidcast starting 2021-06-01 and ending 2021-09-24.

County case data never drops out in the JHU file.

The most recent update to county-level case incidence for reference date 2021-06-01 was November 16 at 10pm. It contains all zeroes for Nebraska counties.

Next step is to run the jhu indicator on staging to see what's going on, however that package hasn't been updated since Nov 1. It should have been updated throughout the month of November, with the most recent change occurring on the 29th.

Next steps:

krivard commented 2 years ago

Correction: county case incidence absolutely does drop out in the JHU file. The only JHU entry with nonzero incidence between 2021-05-26 and 2021-09-24 in Nebraska is for Unassigned.

This is consistent with the most recent JHU-CSSE announcement about Nebraska, however it is confusing as they say there they will only be updating Unassigned, but there are clearly incident cases reported for many Nebraska counties between 2021-09-25 and now.

I've asked on the CSSE thread for clarification. They may not be able to fill in the gap; if so I can add a broad data anomaly flag to all the NE counties for that period.

ryantibs commented 2 years ago

Thanks a lot for following up on all of these. Is the 1 line summary for what has been happening---both in Nebraska and in Utah---that "we are showing what JHU is showing"?

krivard commented 2 years ago

For county counts, yes. JHU tracks unassigned and out-of-state counts as well, plus those weird non-FIPS regions in Utah. We don't currently have a way to show any of those.

ryantibs commented 2 years ago

Thanks. Maybe a good general thing to do will be do put in a warning when the county totals don't add up their parent state. (Allowing for some small error tolerance.). That will be a way to catch unassigned counts. And we can flag this on the dashboard whenever it happens. What do you think?

krivard commented 2 years ago

Maybe? to be feasible, the error tolerance might need to be larger than you'd think.

I took a recent copy of the JHU cases file, then summed the non-county counts (including non-county regions, unassigned, and out-of-state, none of which are shown in the choropleth) vs the county counts (which we do show in the choropleth) for each day for each state. Here's a heat map of the ratio of hidden:shown counts for each day for each state, rounded to the nearest tenth, with ratios larger than 1 truncated to 1. I dropped everything with a ratio of less than 10% since that seemed like a possibly-reasonable threshold for a small error tolerance.

image

Even if we only consider times when unassigned/out-of-state/not-a-county counts exceed the county counts (so ratio >=1), that's still makes for 759 individual warnings.

We do store the non-county counts in the API under the "megacounty" for JHU (since we don't need the megacounty for censoring), so it's plausible to make the client sum up the county counts and compare at render time, but I'm not sure what that would do to our load time. It would be faster to load if we tracked the warnings in the existing anomalies sheet, but maintaining 759 entries that are possibly continually changing is not something we can handle without more data entry staff. I've barely been keeping up with the existing list of ~100 or so anomalies currently in the sheet.

How do you want to proceed?