Consider supporting a national geo level for all indicators

krivard commented 4 years ago

This would be useful for generating nationwide time series plots, and also for completing the nesting doll of scales for viz.

Easiest way to do this is probably just to add it in the geo utility for all the python indicators, and roll out national level as we convert indicators to use the package. The fb-package branch already has an implementation of it for R.

krivard commented 4 years ago

Depends on #215

RoniRos commented 4 years ago

National would be very useful, and while implementing it we should also implement an HHS Regions level.

dshemetov commented 4 years ago

@RoniRos State to HHS region will be available in the new geocode refactor.

For National, what are the expected input geocode levels? Do you expect there to be non-US input codes?

On Mon, Oct 5, 2020, 7:13 AM RoniRos notifications@github.com wrote:

National would be very useful, and while implementing it we should also implement an HHS Regions level.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-delphi/covidcast-indicators/issues/199#issuecomment-703659420, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ76RRQRRBKX4F76D5TGLSJHIARANCNFSM4P2GBMBQ .

RoniRos commented 4 years ago

HHS Regions: awesome!

National: So far we have been focusing on US locations only. I expect this will continue to be the case for the next ~6 months or so. At some point, we may well want to expand internationally. So we should not put any effort into creating codes for other countries, but at the same time we shouldn't do anything that will make it harder to expand internationally later.

krivard commented 4 years ago

National should use the standard two-character country codes (https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)

dshemetov commented 4 years ago

I am not familiar enough with other countries' county-level geocodes to say how the mapping from county to country code should work in general.

In the meantime though, in #217, I have implemented a very simple aggregation to the "us" national level only as a stopgap, which works by summing records with a FIPS or a ZIP code.

Extending to international locations may require larger changes, such as keeping the national ISO code in a separate column from the finer geocode (like JHU does, for example).

RoniRos commented 4 years ago

Many countries have other types of divisions, e.g. provinces, cantons, etc. I wouldn't worry about it now.

@dshemetov When you discuss the simple aggregation by "summing records" above, do you mean aggregation of sample elements? Or weighted averaging of signal values? The important thing is to properly account for all FIPS codes in the country, including those for which an estimate wasn't produced, a sample was too small (or even empty), etc. I assume this was already done in aggregating counties to states. Why not create the national signal directly from the states' signal? Or better yet, from the HHS Regions signals? These are clean hierarchies: every county belongs to exactly one state or territory, every state and territory belong to exactly one HHS Region, and the 10 HHS Regions comprise the national territory.

dshemetov commented 4 years ago

@RoniRos Sounds good, simple for now.

By aggregation, I mean just the aggregation of sample elements. There is no detailed accounting of edge-case FIPS codes in the part of the code I touched (excepting the transformation to megacounties, when sample sizes are below threshold); the main edge case I have thought about is nan-handling, which at the moment are zero-filled. Are there other cases I should be aware of?

I defaulted to working with the finest level geocode in the transformations to simplify the crosswalks transformation graph. I think it's mostly a personal conceptual preference for providing a star graph instead of requiring the user to do a chain of transformations (e.g. FIPS to state to HRR to nation). Since we don't support arbitrary crosswalks between geocodes, my thought was that it would be easier for the utility user to know that FIPS -> * is always available instead of hunting for the correct chain of transformations.

RoniRos commented 4 years ago

the main edge case I have thought about is nan-handling, which at the moment are zero-filled

I am not sure what you mean by zero-filled. Presumably, the nan's are 0/0, so they add zero to both enumerator and denominator, right?

dshemetov commented 4 years ago

The geocoding aggregation in the utility treats all data fields like a counts value and does weighted summing. I have not considered the effects of a denominator. Where do these come up?

RoniRos commented 4 years ago

If these are counts, that's probably the numerator. The denominator is the sample size. All is well.

krivard commented 4 years ago

Wouldn’t summing at the FIPS level exclude the Unassigned records in cases/deaths signals though? That seems bad.

On Mon, Oct 5, 2020 at 8:41 PM Dmitry Shemetov notifications@github.com wrote:

@RoniRos https://github.com/RoniRos Sounds good, simple for now.

By aggregation, I mean just the aggregation of sample elements. There is no detailed accounting of edge-case FIPS codes in the part of the code I touched (excepting the transformation to megacounties, when sample sizes are below threshold); the main edge case I have thought about is nan-handling, which at the moment are zero-filled. Are there other cases I should be aware of?

I defaulted to working with the finest level geocode in the transformations to simplify the crosswalks transformation graph. I think it's mostly a personal conceptual preference for providing a star graph instead of requiring the user to do a chain of transformations (e.g. FIPS to state to HRR to nation).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cmu-delphi/covidcast-indicators/issues/199#issuecomment-703963301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI24CTKA2JS6CAI7KZCMVDSJJRTBANCNFSM4P2GBMBQ .

dshemetov commented 4 years ago

Unassigned and Out of State records are given a FIPS code of XX000, so they should be summed with the rest of the FIPS records.

nickreich commented 4 years ago

I wanted to mention that this would be a super helpful feature for us at the COVID-19 Forecast Hub! We are currently not using covidcast data for jhu-csse due to not having all the backfill data in place and not having a national signal. would be great to hae this resolved!

RoniRos commented 4 years ago

Thanks for the input @nickreich , it is helpful to know. Hopefully this will happen soon.

dshemetov commented 4 years ago

I think this could be started by scoping out the work needed for a particular signal, such as a JHU. @nickreich do you need a national signal for all countries or just the US?

nickreich commented 4 years ago

Thanks @dshemetov . Just the US.

capnrefsmmat commented 3 years ago

FYI, nation support for doctor-visits is more important until CHNG comes back, because COVIDcast 2.0 shows the nation view of all indicators by default. It currently shows a bit N/A for doctor-visits unless you click to see state-by-state numbers.

This is also true for Quidel.

krivard commented 3 years ago

Reviving #616

cmu-delphi / covidcast-indicators

Consider supporting a national geo level for all indicators #199