catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Bad lat values for plant_id_eia 3317 #971

Open MichaelTiemannOSC opened 3 years ago

MichaelTiemannOSC commented 3 years ago

Describe the bug

When I run this query in a Jupyter Notebook, I get a valid longitude, but NaN for a latidtude

plants_eia860.loc[:, ["plant_id_eia", "latitude", "longitude"]][plants_eia860.plant_id_eia==3317]

Bug Severity

How badly is this bug affecting you?

To Reproduce

Steps to reproduce the behavior -- Given above

I am using a fresh 01-pudl-parquet example Notebook and all its settings.

It seems to be wrong in all years.

Right from the start (once the pudl_engine is operative).

Expected behavior

A clear and concise description of what you expected to happen, or what you expected the data to look like.

In 2013 and 2014, the latitude was listed as 33.826667 In 2015 and later it was listed as 33.826655

That is well within the "round to 2 digits" and listen to 70% of the votes. BTW, this plant has the correct longitude value. I don't know what's messing up the latitude value.

Software Environment?

Jupyter Notebook on 2i2c

Additional context

Add any other context about the problem here.

zaneselvans commented 3 years ago

Hmm, that does seem like unexpected behavior. Are you finding those expected Latitude values by looking at the spreadsheets directly? I wonder if it might be a datatype problem where some of the values are being stored as strings in the spreadsheet or something.

MichaelTiemannOSC commented 3 years ago

Yes, those are values copied and pasted directly from the EPA spreadsheets using MS-Excel.

zaneselvans commented 3 years ago

Looks like there are 3789 cases in which there's a non-null longitude, but a null latitude. But only 8 cases where there's a non-null latitude and a null longitude. Seems like a weird skew. Really we want to treat this as a single geopoint, and keep the pair so long as it's within a certain distance of the average location or something like that.

MichaelTiemannOSC commented 3 years ago

Related: 3 of these have null latitude and the other 4 non-null latitude. But all seven have longitudes missing their minus signs:

https://data.catalyst.coop/pudl/plants_entity_eia?_sort=plant_id_eia&longitude__gt=0