catalyst-cooperative / rmi-ferc1-eia

A collaboration with RMI to integrate FERC Form 1 and EIA CapEx and OpEx reporting
MIT License
3 stars 3 forks source link

determine granularity match of depreciation records to steam records (thru MUL) #52

Closed cmgosnell closed 4 years ago

cmgosnell commented 4 years ago

Well... ~9% of the depreciation records line up perfectly with their corresponding ferc1 steam record :-/

connects_deprish_ferc1 = pd.merge(connects_deprish_eia, connects_ferc1_eia, 
                                  on=['record_id_eia'],
                                  suffixes=('_eia','_ferc1'),
                                  how='outer', indicator=True
                                 )

connects_clean = connects_deprish_ferc1[connects_deprish_ferc1['_merge'] == 'both']
f"Clean Depreciation to EIA Connections: {len(connects_clean)/len(connects_deprish_eia):.02%}"

Produces: 'Clean Depreciation to EIA Connections: 9.16%'

Next step is to determine how many of the depreciation records are at a bigger granularity than their corresponding ferc1 steam record. And then those that are smaller than their corresponding ferc record.

cmgosnell commented 4 years ago

My current musings tell me that most of the depreciation records are at a lower level than the records in the steam table:

Portion of depreciation records compared to FERC Steam are:
    Smaller:    40.78%
    Bigger:     1.95%
    Same level: 11.52%
    No link:    45.74%

I still haven't investigated the records without links. I expected there to be some because there are small plants and hydro plants in here that won't show up in the steam table, but I still need to track this down.

cmgosnell commented 4 years ago

Notes for fine tuning....

Restrict columns by:

Things to figure out:

cmgosnell commented 4 years ago

On ownership. I'm struggling thinking through how to deal with different ownership between the depreciation records and the steam records.

If the steam record is an owned record and the deprish record is a total record.... how should we connect them? My first thought is that we could grab the ownership levels from the steam table and only do the fuzzy matching between depreciation and the MUL on ownership. Then the connections between the tables would be on the same grounds. If we are just trying to pull over the non-fuel opex costs/MWh, then this should be too bad, but I would have to be very careful to know that the owners the records could be different - and thus the data about them could be different. I'll keep mulling this one over to see if there are other options.

cmgosnell commented 4 years ago

We need to develop methods to choose the right record(s) from FERC out of the various options when connecting depreciation records and FERC1 steam. We are generating all of our options between these data sets by merging on these columns: ['plant_id_pudl', 'utility_id_pudl', 'report_date'].

Here are some options for methods. This is almost certainly not inclusive.

Option 1:

Option 2:

Option 3:

__

Option 3:

Option 4:

Option x:

We also need to incorporate the primary line flag into the depreciation records so we can remove them in this matching process.