Open grgmiller opened 6 months ago
Just wanted to bump this issue: we're having some issues with bad timezone data due to bad lat/long values (https://github.com/catalyst-cooperative/pudl/issues/1192). We're going to try and patch this on our end, but it would be helpful if this could be fixed in pudl as well!
Describe the bug
Several related issues:
plants_entity_eia
and other tables that contain lat/long data. As far as I can tell from manually inspecting the raw EIA-860 plants file from 2022, there are a small number of plants that are missing both lat/long data, but none that are only missing latitude data.Bug Severity
Medium: With some effort, I can work around the bug.
To Reproduce
I downloaded data from https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/v2023.12.01/pudl.sqlite.gz, and am loading the
plants_entity_eia
table usingpd.read_sql("SELECT plant_id_eia, latitude, longitude FROM plants_entity_eia", PUDL_ENGINE)
213 plants are missing latitudes
16 plants have coordinates further east than the east coast of the US:
One plant has a non-existant coordinate:
Spot checking these plants revealed that they all appear to have valid/non-missing coordinates in the 2022 EIA-860 plants file.
Expected behavior
I expect the lat/long data in plants_entity_eia to match the lat/long data in the most recent raw EIA-860 table for which lat/long data is available.
Software Environment?
Additional context
Add any other context about the problem here.