catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Valaidate lat/long values for plants in EIA 860 #402

Open cmgosnell opened 4 years ago

cmgosnell commented 4 years ago

It would be good to test whether or not the latitude and longitude are in the same state and/or county as the plant or utility entity. Coming from Issue #276.

zaneselvans commented 4 years ago

To make this work we'll need a mapping of lat/lon values to states and/or county boundaries. States wouldn't be too large, but counties could be. States are certainly available as a public dataset.

Might make sense to do in conjunction with adding FIPS IDs (See issue #338).

MichaelTiemannOSC commented 3 years ago

The Python module stateplane does this.

grgmiller commented 1 year ago

At an even more basic level, it would be great to implement a sanity check that the coordinates are even in the U.S. or if they are bad coordinates. A rough bounding box for the continental U.S. is:

LONGITUDE_MIN = -126
LONGITUDE_MAX = -66
LATITUDE_MIN = 25
LATITUDE_MAX = 49

We recently checked data from the 2021 EIA-860 generators table against these for wind and solar plants and found multiple plants that had coordinates that were in the middle of the ocean or in China:

ba_code plant_id_eia generator_id capacity_mw latitude longitude
ERCO 62715 WCCWF 180.1 33.54915 -33.550555
ISNE 58279 1 5.0 42.219722 -42.219722
ISNE 58280 1 3.0 42.164722 -42.164722
ISNE 58282 1 4.8 42.222778 -42.219722
ISNE 58283 1 4.5 41.766389 -41.766389
PJM 59641 5MWPV 5.0 36.468 77.592
DUK 59929 NB007 5.0 35.72 81.417

Some of these are easy to fix: in the case of the PJM and DUK generators, the issue is just that the longitude needs to have a negative sign in front of it (this can easily be verified with google maps satellite view). For the other ones, it looks like maybe the longitude value was missing and simply filled using the negative of the latitude value? These might need some manual searching of the plant.

It would be great if this could be screened and fixed as part of the ETL process for EIA-860.