catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Bad lat/lon values for plant_id_eia 60265 and 60266 #309

Closed zaneselvans closed 4 years ago

zaneselvans commented 5 years ago

Describe the bug The latitude & longitude values in the plants_entity_eia table are incorrect for plant_id_eia values 60265 and 60266, as reported by Josh Rhodes.

joshdr83 commented 5 years ago

Add plant_id_eia 60237 to that list as per #311

gschivley commented 5 years ago

@joshdr83 how do you know they're wrong? The 60265 lat/lon values match across EIA860 (2016 & 2017) and plants_entities_eia. Those also line up with the county (Hunterdon, NJ) listed in NEEDS, and google maps shows a bunch of solar panels at that location.

joshdr83 commented 5 years ago

I was cross-checking them against the 2___Plant_Y2017.xlsx sheet and the plants_entities_eia table? Did I mess one up? -Josh

Screen Shot 2019-06-13 at 11 26 11 AM Screen Shot 2019-06-13 at 11 26 23 AM

On Jun 13, 2019, at 11:22 AM, Greg Schivley notifications@github.com wrote:

@joshdr83 https://github.com/joshdr83 how do you know they're wrong? The 60265 lat/lon values match across EIA860 (2016 & 2017) and plants_entities_eia. Those also line up with the county (Hunterdon, NJ) listed in NEEDS, and google maps shows a bunch of solar panels at that location https://www.google.com/maps/place/40%C2%B031'25.3%22N+74%C2%B050'36.2%22W/@40.523139,-74.845422,824m/data=!3m1!1e3!4m5!3m4!1s0x0:0x0!8m2!3d40.5237!4d-74.843398.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/catalyst-cooperative/pudl/issues/309?email_source=notifications&email_token=AAZQFTFJW4KQ7IOSKCQFB2TP2J64NA5CNFSM4HXU6EX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUNTOY#issuecomment-501799355, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQFTA4Y64MSNUMLDNQGC3P2J64NANCNFSM4HXU6EXQ.

cmgosnell commented 5 years ago

Hi @joshdr83 ! I'll look into this shortly... if you find any other lat/long inconsistencies drop them here!

gschivley commented 5 years ago

60265 first shows up in the 2015 version of 860 with the wrong lat/lon (45.400524, -122.765822). Fun fact: that location is just down the street from a bouldering gym.

@zaneselvans and @cmgosnell is PUDL building the plants_entities_eia table using the first occurrence in EIA860? If so, these could all be issues with location in at least the first year. Could be solved by an error check to see if it's in the right state (did @karldw do something like this for CEMS timezones?) and then checking subsequent years. Or check all available years to see if they match.

cmgosnell commented 5 years ago

Ha! bouldering gyms are not power plants in the traditional sense!

they are actually being "harvested" of sorts from all of their occurrences across 860 and 923. We take the most consistently reported record and if the records are not 70% consistent then we don't bring in anything.

Lat/long is the most messy/inconsistent so I made a tiny exception that rounds down the accuracy. That's definitely not the best way to do that so if either of you have any suggestions on that. But I don't think the rounding is an issue doesn't seem like what is happening in these cases. These just seem broken, which makes me want to generally debug the harvesting process.

gschivley commented 5 years ago

Without the time to look closely at the existing code, I'd do something like this for lat/long:

Not sure if it's worth the trouble to do all of those checks/calculations for a few plants though.

karldw commented 5 years ago

@gschivley, I didn't do that check, but it shouldn't be terribly difficult. Take care if you use counties though, because they've changed a little over time.

cmgosnell commented 4 years ago

subsumed within Issue #446