catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Identify FERC Form 2 respondents #1397

Closed zaneselvans closed 2 years ago

zaneselvans commented 2 years ago

In our work with RMI we are finding FERC Form 2 filings associated with local / regional gas companies, available via state-level regulatory findings. So far these utilities include:

However, these companies don't appear in the f2_001_ident_attsttn table -- it only contains what look like interstate gas pipeline / transmission companies.

We need to understand which entities and what data are going to be available in FERC 2. Do a preliminary extraction of any tables that appear relevant and see if these utilities show up in them.

Smaller gas utilities may also show up only in the FERC Form 2A, but the ones above are large enough that they should be in the regular Form 2.

zaneselvans commented 2 years ago

The f2_s0_respondent_id table has about 265 respondents in it. Northwest Natural Gas Company does appear there but I'm not seeing Puget Sound or PG&E. respondent_id=86 has someone with a pge.com email address listed as the contact. Searching through all of the extracted tables stored as CSVs I'm not finding any references to Puget Sound or PG&E or permutations of those names really anywhere.

respondent_id=154 (Jackson Prairie Underground Storage Project) has contacts with pse.com email addresses.

Based on the FERC website it seems like Forms 2 & 2A are both included in the same data files, so it's really weird that these companies are making reports that show up somewhere in PUC dockets as Form 2, but don't show up in the Form 2 data. WTF?

zaneselvans commented 2 years ago

It seems like it might be the case that some states require utilities to report FERC Form 2 directly to the state -- but in that case the data doesn't go to FERC at all. 😢

zaneselvans commented 2 years ago
import glob
import dbfread
dbf_files = sorted(glob.glob("../../pudl-work/data/local/ferc2/UPLOADERS/FORM2/working/*.DBF"))
csv_dir = pathlib.Path("../../pudl-work/data/local/ferc2/csv")
csv_dir.mkdir(parents=True, exist_ok=True)
nrows = 0
for fn in dbf_files:
    dbf_path = pathlib.Path(fn)
    table_name = dbf_path.stem.lower()
    dude = dbfread.DBF(dbf_path, encoding="ISO-8859-1")
    dude.load()
    df = pd.DataFrame(dude.records)
    nrows += len(df)
    df.to_csv(csv_dir / f"{table_name}.csv", index=False, escapechar="\\")
print(f"Processed {nrows} records.")

ferc2_rids = pd.read_csv(csv_dir / "f2_s0_respondent_id.csv")
ferc2_rids = ferc2_rids.rename(columns={
    "RESPONDENT": "respondent_id",
    "RESPONDEN2": "respondent_name",
})
ferc2_rids = ferc2_rids[["respondent_id", "respondent_name"]]
ferc2_rids[ferc2_rids.respondent_name.str.contains("puget", case=False)]
zaneselvans commented 2 years ago

It looks like the FERC 2 respondents (in FERC's DB) are very focused on the interstate pipeline companies, meaning there's much less financial information about state-level gas utilities available from FERC than there is for electric utilities. We'll probably need to draw on the PHMSA annual gas reports, maybe state-level data, what little gas utility data shows up in the FERC 1, and maybe SEC or other outside data to get at the book values / depreciation for gas utilities.