catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

FERC Form 1 Data Misaligned for 1995 #1666

Open a-g-benson opened 2 years ago

a-g-benson commented 2 years ago

Describe the bug

Working with the FERC Form 1 for pumped hydro storage, I have found that many (if not quite all) records have data entered into the wrong columns for the year 1995. Specifically, all data beginning with "capex_land" is shifted to the right by one column. You can tell that it's a misalignment by comparing data in 1994 and 1996 with the data for 1995.

Casual inspection of the FERC Form 1 data for conventional hydro finds that this problem is less common there, but not absent. I haven't look at enough FERC Form 1 datasets to say how pervasive it is.

Bug Severity

To Reproduce

Take a look at this SQL query I have set up for Helms Pumped Storage in Datasette, for the years 1994 to 1997. In 1995, the value for capex_land is missing. The value for capex_structures is 674880.0, which is funny considering that the value for capex_land was 674880.0 in 1994. Capex_structures in 1994 is 188457613.0. It's hard to believe that PG&E dismantled that many structures in 1995 and then put them back in place in 1996.

Expected behavior

The cost data for 1995 should be aligned to the same columns as all other years.

Software Environment?

Additional context

image

zaneselvans commented 1 year ago

My first guess would be that this is an instance of respondents using the wrong year's form, a year after the line numbering has changed, see #471. But the plants tables don't use row numbers, and it goes back to being aligned correctly in subsequent years, so it's probably not that. My guess is the respondent just filled out the form wrong.

This would be tricky to fix in a programmatic way, but maybe possible?