Closed hwchen closed 6 years ago
And just for my information, 04000US80
is "Offshore areas not associated with a state", which should also be removed.
For Vamsi: Anything that does not follow [3-digit summary code]00USxxx...
should be removed or cleaned.
I'll probably just give them a full list of both non-conforming ids, as well as missing ids. I think that for 2012, many of the non-conforming ids can be fixed by removing an extra 00
before US
.
Some more notes on non-conforming ids in 2012:
01000R0US -> `R0` is extra
0100000US -> `extra `00`
0400000US01 -> for all states, there's an extra `00`
04000G0US01 -> for states, `G0` is extra
0500000US48301 -> for counties, only extra `00`
310M200US37540 -> for msa, all have extra `M2`
All other years appear to be consistent, except that the id may have been removed from tiger (as in the first comment in this issue)
There's also the question for whether the G0
or M2
is should be removed. For Msa, there's no non-M2
id. For state, there's both with and without G0
.
A lot of these actually don't even have pages on the old Data USA... but I was able to find a couple that have had decent page views over the past year. In particular 05000US46113 had 181 views and 31000US26180 had 546.
My gut tells me we should try to get these in. What's the level of effort to add them?
31000US26180 is Honolulu. In the current tiger, 31000US46520 is Urban Honolulu. (Nothing else has '%Hono%' in the name).
05000US46113 is Shannon County, SD. https://en.wikipedia.org/wiki/Oglala_Lakota_County,_South_Dakota It is now known as 05000US46102 Oglala Lakota County, SD (since May 2015).
How should we handle these transitions? And were there any others with decent page views?
(should we add a column for deprecated names? and/or deprecated geoid?)
I also want to add a comment here, that this issue gets at something broader: whether we want to do geographic migrations when we update datausa.
Just spoke with Walther. We're going to redirect the old pages to the new pages.
@hwchen use this endpoint to test if the IDs exist in our current site at all. If they do, post a JSON mapping here of old IDs to new IDs and I can handle the redirects on the front-end.
Completed. I sent a csv to @davelandry .
My methodology is in https://github.com/Datawheel/datausa-acs-etl/commit/f8760386c8acb81477c643388be4084c97322081
(originally found the geos in economic census data, the processed in the datausa-acs-etl)
@davelandry you mentioned that you would check county profiles for usage, to see how much of an issue removing those profiles would be.
I also have a list of Msa's that were removed.
Counties
Msas