covidatlas / coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sources.
https://coronadatascraper.com
BSD 2-Clause "Simplified" License
365 stars 180 forks source link

Any thoughts to adding Census FIPS codes to US Counties? #199

Closed reyemtm closed 4 years ago

reyemtm commented 4 years ago

It's not clear how people are joining the US county data to the county geography. I can only guess people are either doing point in polygon joins or using the county and state name. FIPS codes could be used and is standard practice in GIS. The county data available at https://github.com/topojson/us-atlas already has FIPS codes baked in, stored as the feature id. This code does have leading zeros, so these must be stored as text.

veyEskelson commented 4 years ago

Yes, Iv'e been using spatial join to get the data in polygon form.They are programmers more then GIS mappers. FIPS is a US Census Standard, no a world standard. Other nations also have a unique identifier code for their State/counties/provinces/districts/etc. To add the GEOID10 (State FIPS code and County code) seems a bit much to do for just the USA. A bit of research would have to be done to populate the field for other nations. A cross check might be needed to identify any duplicates.

lazd commented 4 years ago

Yes, point in polygon name name matches. I'd love to standardize on this. It will be a significant effort to add it to every scraper -- it will need to include the code in its returned output. Or, we can post process after the fact (match on county/state, add the FIPS code).

@hyperknot, thoughts?

hyperknot commented 4 years ago

FIPS code is a good idea and can be a standard for US. The best would be combining the state's ISO 3166-2 code with FIPS 6-4.

So for example, Santa Clara would be fips:US-CA-06085.

For other countries, it'll all be a very different system, this cannot be standardised. We can settle on a form of idsystem:country-state-county code style though.


Now about point-in-polygon I don't really understand what does it solve? A developer picks a lat-lon point manually, then we try to figure out what polygon is it in? Why not just ask the developer to include the FIPS code in the results? JHU is a very special case, but generally I don't see why would we need to do point-in-polygon in the codebase.

hyperknot commented 4 years ago

I'm not in the US so I have no background on this, but wiki says that FIPS is now deprecated: https://en.wikipedia.org/wiki/FIPS_county_code

hyperknot commented 4 years ago

OK, from what I've read in Wikipedia FIPS county code is still the standard way of referring to counties. Also it is the only way what https://eric.clst.org/tech/usgeojson/ dataset contains for the GeoJSONs.

veyEskelson commented 4 years ago

Adapting the system work for other counties? Example: United Kingdom Local Authority Districts/Region (nine character codes): Geography Entity Code E England S Scotland W Wales N Northern Ireland K Cross-border Instance J Experimental ex: E41000008 | Blackburn with Darwen http://geoportal.statistics.gov.uk/

US Census breakdown: https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html

reyemtm commented 4 years ago

Really like the idea of country-state-county code, makes me wonder how is this not already established. Is this really the first world wide dataset with sub-state/province level detail?

hyperknot commented 4 years ago

@veyEskelson if you have local knowledge in the UK, feel free to open an issue in https://github.com/hyperknot/country-level-id so we can add it as id4. I'll add US counties as id4.

The currens ones are: id0:GB United Kingdom (id3) id1:ENG England id1:NIR Northern Ireland id1:SCT Scotland id1:WLS Wales

praging commented 4 years ago

Duplicate of #286