catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Missing and incorrect latitude / longitude data in `plants_entity_eia` #3141

Open grgmiller opened 6 months ago

grgmiller commented 6 months ago

Describe the bug

Several related issues:

Bug Severity

Medium: With some effort, I can work around the bug.

To Reproduce

I downloaded data from https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/v2023.12.01/pudl.sqlite.gz, and am loading the plants_entity_eia table using pd.read_sql("SELECT plant_id_eia, latitude, longitude FROM plants_entity_eia", PUDL_ENGINE)

213 plants are missing latitudes

image

16 plants have coordinates further east than the east coast of the US:

image

One plant has a non-existant coordinate:

image

Spot checking these plants revealed that they all appear to have valid/non-missing coordinates in the 2022 EIA-860 plants file.

Expected behavior

I expect the lat/long data in plants_entity_eia to match the lat/long data in the most recent raw EIA-860 table for which lat/long data is available.

Software Environment?

Additional context

Add any other context about the problem here.

grgmiller commented 1 month ago

Just wanted to bump this issue: we're having some issues with bad timezone data due to bad lat/long values (https://github.com/catalyst-cooperative/pudl/issues/1192). We're going to try and patch this on our end, but it would be helpful if this could be fixed in pudl as well!