catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Bad or unmapped utility_id_eia value(s) in ownership_eia860 table #967

Closed zaneselvans closed 2 years ago

zaneselvans commented 3 years ago

There appears to be a bad utility_id_eia value in the ownership_eia860 table, associated with plant_id_eia==56032 See the dataframe returned by:

own_eia860 = pudl_out.own_eia860()
own_eia860[own_eia860.plant_id_eia==56032]

However, in all other years, that plant has the same utility_id_eia associated with its generators, and it is consistently mapped to a utility_id_pudl

Seems like we should consider some kind of NA filling in the harvesting process.

zaneselvans commented 3 years ago

@cmgosnell Is this related to the weird owner vs. operator thing you encountered recently?

cmgosnell commented 3 years ago

It could be. I know we are re-vamping harvesting but in the meantime I've been leaning more and more towards have utility_id_eia be one of the columns which we always harvest some value for. It is an integral merge key and the original data has known issues, especially with the ownership table. This would be implemented by simply adjusting the required strictness for this column.

cmgosnell commented 2 years ago

Fixed via #1116.