DataUSA / datausa-tracker

0 stars 0 forks source link

Counties and Msas that have been removed between current datausa and 2017 Tiger #149

Closed hwchen closed 6 years ago

hwchen commented 6 years ago

@davelandry you mentioned that you would check county profiles for usage, to see how much of an issue removing those profiles would be.

I also have a list of Msa's that were removed.

Counties

05000US51515
05000US02270
05000US46113
05000US02280
05000US02201
05000US02232

Msas

31000US22460
31000US31100
31000US42060
31000US23020
31000US42260
31000US46940
31000US26180
31000US14060
31000US11300
31000US29140
31000US26100
31000US11340
31000US14980
31000US17180
31000US37820
31000US16900
31000US29580
31000US46420
31000US19900
31000US47500
31000US29220
31000US17940
31000US24180
31000US14940
31000US36060
31000US33820
31000US28460
31000US30100
31000US28980
31000US48340
31000US13860
31000US38020
31000US43540
31000US48740
31000US42580
31000US37380
31000US25380
31000US32060
31000US42500
31000US49060
31000US18340
31000US40020
31000US38200
31000US10020
31000US18940
31000US35340
31000US33380
31000US45260
31000US40500
31000US36180
31000US30500
31000US10880
31000US49540
31000US25660
31000US30740
31000US41580
31000US44380
31000US30540
31000US43860
31000US47860
31000US20620
31000US37700
31000US39100
31000US25340
31000US22980
31000US46260
31000US26480
31000US45640
31000US19020
31000US32270
hwchen commented 6 years ago

And just for my information, 04000US80 is "Offshore areas not associated with a state", which should also be removed.

For Vamsi: Anything that does not follow [3-digit summary code]00USxxx... should be removed or cleaned.

I'll probably just give them a full list of both non-conforming ids, as well as missing ids. I think that for 2012, many of the non-conforming ids can be fixed by removing an extra 00 before US.

hwchen commented 6 years ago

Some more notes on non-conforming ids in 2012:

01000R0US -> `R0` is extra
0100000US -> `extra `00`
0400000US01 -> for all states, there's an extra `00`
04000G0US01 -> for states, `G0` is extra
0500000US48301 -> for counties, only extra `00`
310M200US37540 -> for msa, all have extra `M2`

All other years appear to be consistent, except that the id may have been removed from tiger (as in the first comment in this issue)

There's also the question for whether the G0 or M2 is should be removed. For Msa, there's no non-M2 id. For state, there's both with and without G0.

davelandry commented 6 years ago

A lot of these actually don't even have pages on the old Data USA... but I was able to find a couple that have had decent page views over the past year. In particular 05000US46113 had 181 views and 31000US26180 had 546.

My gut tells me we should try to get these in. What's the level of effort to add them?

hwchen commented 6 years ago

31000US26180 is Honolulu. In the current tiger, 31000US46520 is Urban Honolulu. (Nothing else has '%Hono%' in the name).

05000US46113 is Shannon County, SD. https://en.wikipedia.org/wiki/Oglala_Lakota_County,_South_Dakota It is now known as 05000US46102 Oglala Lakota County, SD (since May 2015).

How should we handle these transitions? And were there any others with decent page views?

hwchen commented 6 years ago

(should we add a column for deprecated names? and/or deprecated geoid?)

hwchen commented 6 years ago

I also want to add a comment here, that this issue gets at something broader: whether we want to do geographic migrations when we update datausa.

davelandry commented 6 years ago

Just spoke with Walther. We're going to redirect the old pages to the new pages.

@hwchen use this endpoint to test if the IDs exist in our current site at all. If they do, post a JSON mapping here of old IDs to new IDs and I can handle the redirects on the front-end.

hwchen commented 6 years ago

Completed. I sent a csv to @davelandry .

My methodology is in https://github.com/Datawheel/datausa-acs-etl/commit/f8760386c8acb81477c643388be4084c97322081

(originally found the geos in economic census data, the processed in the datausa-acs-etl)