catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Add FIPS codes to power plant database #338

Open joshdr83 opened 5 years ago

joshdr83 commented 5 years ago

Is your feature request related to a problem? Please describe. The plants_entity_eia table has state and county names in character format, but later matching these names to other county names, say in a shape file, can be problematic in that strings are not great to match on, for instance St. Louis won't match with Saint Louis, etc. Having a FIPS mapping would standardize locations of the power plants in a more granular way.

Describe the solution you'd like A mapping from plant_id_eia to county FIPS codes.

Describe alternatives you've considered I have tried using a GIS based matching using the lat/longs of the power plants in the database, but these values are notoriously bad, and if a power plant is close to a river that divides two states, the lat/long might even put it in the wrong state...

Additional context Most power plants will likely directly match to a FIPS value with a state/county merge with a county shape file database, but some might require some more direct interaction.

zaneselvans commented 5 years ago

@joshdr83 are there any examples of that kind of fuzzy merge on string-based state/county information to get FIPS IDs that you could point us at? What's the authoritative source for the city/state to FIPS ID mapping? Is it this Excel spreadsheet? Is there a machine readable/API version from the Census?

joshdr83 commented 5 years ago

My fuzzy merge attempts on strings have not been that successful as I find that I have to do so much checking that it becomes manual in the end. Some preprocessing like Saint -> St., etc. helps.

I usually use the data that underlies the shape files found here, but that spreadsheet might be very similar?

cmgosnell commented 3 years ago

we've added a function pudl.helpers.add_fips_ids(). this would now be relatively easy to use on any of the output tables or even the database tables.

joshdr83 commented 3 years ago

Great! Thank you.

Joshua Rhodes https://sites.google.com/site/joshdr83/ @joshdr83

On Sep 2, 2021, at 9:08 AM, Christina Gosnell @.***> wrote:

we've added a function pudl.helpers.add_fips_ids(). this would now be relatively easy to use on any of the output tables or even the database tables.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/catalyst-cooperative/pudl/issues/338#issuecomment-911725618, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQFTBHBWANQVETZIWN2VDT76AONANCNFSM4H4WBVQQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.