Closed krivard closed 3 years ago
Depends on #215
National would be very useful, and while implementing it we should also implement an HHS Regions level.
@RoniRos State to HHS region will be available in the new geocode refactor.
For National, what are the expected input geocode levels? Do you expect there to be non-US input codes?
On Mon, Oct 5, 2020, 7:13 AM RoniRos notifications@github.com wrote:
National would be very useful, and while implementing it we should also implement an HHS Regions level.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-delphi/covidcast-indicators/issues/199#issuecomment-703659420, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ76RRQRRBKX4F76D5TGLSJHIARANCNFSM4P2GBMBQ .
HHS Regions: awesome!
National: So far we have been focusing on US locations only. I expect this will continue to be the case for the next ~6 months or so. At some point, we may well want to expand internationally. So we should not put any effort into creating codes for other countries, but at the same time we shouldn't do anything that will make it harder to expand internationally later.
National should use the standard two-character country codes (https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)
see also https://github.com/cmu-delphi/delphi-epidata/pull/207
I am not familiar enough with other countries' county-level geocodes to say how the mapping from county to country code should work in general.
In the meantime though, in #217, I have implemented a very simple aggregation to the "us" national level only as a stopgap, which works by summing records with a FIPS or a ZIP code.
Extending to international locations may require larger changes, such as keeping the national ISO code in a separate column from the finer geocode (like JHU does, for example).
Many countries have other types of divisions, e.g. provinces, cantons, etc. I wouldn't worry about it now.
@dshemetov When you discuss the simple aggregation by "summing records" above, do you mean aggregation of sample elements? Or weighted averaging of signal values? The important thing is to properly account for all FIPS codes in the country, including those for which an estimate wasn't produced, a sample was too small (or even empty), etc. I assume this was already done in aggregating counties to states. Why not create the national signal directly from the states' signal? Or better yet, from the HHS Regions signals? These are clean hierarchies: every county belongs to exactly one state or territory, every state and territory belong to exactly one HHS Region, and the 10 HHS Regions comprise the national territory.
@RoniRos Sounds good, simple for now.
By aggregation, I mean just the aggregation of sample elements. There is no detailed accounting of edge-case FIPS codes in the part of the code I touched (excepting the transformation to megacounties, when sample sizes are below threshold); the main edge case I have thought about is nan-handling, which at the moment are zero-filled. Are there other cases I should be aware of?
I defaulted to working with the finest level geocode in the transformations to simplify the crosswalks transformation graph. I think it's mostly a personal conceptual preference for providing a star graph instead of requiring the user to do a chain of transformations (e.g. FIPS to state to HRR to nation). Since we don't support arbitrary crosswalks between geocodes, my thought was that it would be easier for the utility user to know that FIPS -> * is always available instead of hunting for the correct chain of transformations.
the main edge case I have thought about is nan-handling, which at the moment are zero-filled
I am not sure what you mean by zero-filled. Presumably, the nan's are 0/0, so they add zero to both enumerator and denominator, right?
The geocoding aggregation in the utility treats all data fields like a counts value and does weighted summing. I have not considered the effects of a denominator. Where do these come up?
If these are counts, that's probably the numerator. The denominator is the sample size. All is well.
Wouldn’t summing at the FIPS level exclude the Unassigned records in cases/deaths signals though? That seems bad.
On Mon, Oct 5, 2020 at 8:41 PM Dmitry Shemetov notifications@github.com wrote:
@RoniRos https://github.com/RoniRos Sounds good, simple for now.
By aggregation, I mean just the aggregation of sample elements. There is no detailed accounting of edge-case FIPS codes in the part of the code I touched (excepting the transformation to megacounties, when sample sizes are below threshold); the main edge case I have thought about is nan-handling, which at the moment are zero-filled. Are there other cases I should be aware of?
I defaulted to working with the finest level geocode in the transformations to simplify the crosswalks transformation graph. I think it's mostly a personal conceptual preference for providing a star graph instead of requiring the user to do a chain of transformations (e.g. FIPS to state to HRR to nation).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cmu-delphi/covidcast-indicators/issues/199#issuecomment-703963301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI24CTKA2JS6CAI7KZCMVDSJJRTBANCNFSM4P2GBMBQ .
Unassigned and Out of State records are given a FIPS code of XX000, so they should be summed with the rest of the FIPS records.
I wanted to mention that this would be a super helpful feature for us at the COVID-19 Forecast Hub! We are currently not using covidcast data for jhu-csse due to not having all the backfill data in place and not having a national signal. would be great to hae this resolved!
Thanks for the input @nickreich , it is helpful to know. Hopefully this will happen soon.
I think this could be started by scoping out the work needed for a particular signal, such as a JHU. @nickreich do you need a national signal for all countries or just the US?
Thanks @dshemetov . Just the US.
FYI, nation support for doctor-visits is more important until CHNG comes back, because COVIDcast 2.0 shows the nation view of all indicators by default. It currently shows a bit N/A for doctor-visits unless you click to see state-by-state numbers.
This is also true for Quidel.
Reviving #616
This would be useful for generating nationwide time series plots, and also for completing the nesting doll of scales for viz.
Easiest way to do this is probably just to add it in the geo utility for all the python indicators, and roll out national level as we convert indicators to use the package. The fb-package branch already has an implementation of it for R.