catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
489 stars 115 forks source link

(ferc 💥 metadata) create xbrl_factoid metadata containing tree-like provenance #2625

Closed cmgosnell closed 1 year ago

cmgosnell commented 1 year ago

investigate storage/translation of tree-like nature of these nested calculations/relationships. (Simplest current solution is to store the relationships between tables/fields in an arbitrarily nested dictionary. We'd use that dict to take a table or field and convert it into a table with all of its sub-components w/ calculated values identified -> calced using sub-components for validation -> replaced w/ sub-components)

for each xbrl_factoid:

Design Notes

Questions

Example calculation (JSON): ```json [ { "name": "utility_plant", "weight": 1.0, "source_table": [ "balance_sheet_assets_ferc1" ] }, { "name": "utility_plant_in_service_classified_and_unclassified", "weight": 1.0 }, { "name": "utility_plant_leased_to_others", "weight": 1.0 }, { "name": "utility_plant_held_for_future_use", "weight": 1.0 }, { "name": "construction_work_in_progress", "weight": 1.0 }, { "name": "utility_plant_acquisition_adjustment", "weight": 1.0 }, { "name": "utility_plant_and_construction_work_in_progress_correction", "weight": 1.0 } ] ```

Classes / Methods

Notes from Call w/ CG

zaneselvans commented 1 year ago

Raw materials we're working with is the metadataframes that come out of the table transformers, especially the calculations column.

Output that we want in the end is

Start with balance_sheet_assets_ferc1 table because it has a lot of calculations, both inter and intra-table. Whether a calculation is within or between tables can be derived entirely from information contained in the calculation.

Calculation exists independent of any table or factoid -- doesn't say what factoid it's calculating or what table that calculation is within. In the exploded metadata table we'll have both source_table and xbrl_factoid in their own columns

zaneselvans commented 1 year ago

What does it mean when a given calculation component has multiple source tables? After discussing with @cmgosnell it seems like there are at least two options. If they are both happening, then we need to be able to differentiate between them and treat them differently in constructing the calculation trees.

Multiple values in source_tables could:

In general it seems like when referring to a calculation component we need to include both a source table and the fact name.

zaneselvans commented 1 year ago

We've added xbrl_factoid_correction elements to the calculations (always) and into the data tables themselves (when there's a correction required), but we have NOT added them into the processed metadata, which is resulting in an inconsistency.

For each calculated xbrl_factoid in the processed metadata, add another identical record but xbrl_factoid_correction + a null calculation "[]"

@cmgosnell says it will be difficult to add the name_original element into the calculation components:

Identifying unconnected components of the tree:

Removing duplicate facts

zaneselvans commented 1 year ago

I've added some draft XBRL calculation tree infrastructure in #2653. Below is an example of how to use it.

I'm doing something wrong in the recursive resolution of the calculation components. Probably it's that I'm editing the calculation in place rather than returning a new calculation in the Ferc1XbrlCalculation.resolve() method.

from pudl.output.ferc1 import MetadataExploder
from pudl.etl import defs
import json
from pudl.output.ferc1 import Ferc1XbrlCalculation, Ferc1XbrlCalculationComponent

xbrl_meta = defs.load_asset_value(AssetKey("clean_xbrl_metadata_json"))
balance_sheet_asset_tables = [
    "balance_sheet_assets_ferc1",
    "utility_plant_summary_ferc1",
    "plant_in_service_ferc1",
]
exploded_meta = (
    MetadataExploder(balance_sheet_asset_tables).boom(xbrl_meta)
)[[
    "table_name",
    "xbrl_factoid",
    "calculations",
    "intra_table_calc_flag",
    "xbrl_factoid_original",
]]

calc = Ferc1XbrlCalculation.from_exploded_meta(
    exploded_meta=exploded_meta,
    table_name="balance_sheet_assets_ferc1",
    xbrl_factoid="utility_plant_net",
)
print(json.dumps(calc.dict(), indent=4))

# Ahhhh, I think this needs to return the resolved calculation in the recursion...
calc.resolve(exploded_meta=exploded_meta)
print(json.dumps(calc.dict(), indent=4))

Original Calculation

{
    "calculations": [
        {
            "name": "utility_plant_and_construction_work_in_progress",
            "weight": 1.0,
            "source_tables": [
                "balance_sheet_assets_ferc1"
            ],
            "calculation": null
        },
        {
            "name": "accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility",
            "weight": -1.0,
            "source_tables": [
                "balance_sheet_assets_ferc1"
            ],
            "calculation": null
        },
        {
            "name": "utility_plant_net_correction",
            "weight": 1.0,
            "source_tables": [
                "balance_sheet_assets_ferc1"
            ],
            "calculation": null
        }
    ],
    "source_table": "balance_sheet_assets_ferc1",
    "xbrl_factoid": "utility_plant_net",
    "xbrl_factoid_original": "utility_plant_net"
}

(Partially) Resolved Calculation

big JSON blob here ```json { "calculations": [ { "name": "utility_plant_and_construction_work_in_progress", "weight": 1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": { "calculations": [ { "name": "utility_plant", "weight": 1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": null }, { "name": "utility_plant_in_service_classified_and_unclassified", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "utility_plant_leased_to_others", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "utility_plant_held_for_future_use", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "construction_work_in_progress", "weight": 1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": null }, { "name": "utility_plant_acquisition_adjustment", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "utility_plant_and_construction_work_in_progress_correction", "weight": 1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": null } ], "source_table": "balance_sheet_assets_ferc1", "xbrl_factoid": "utility_plant_and_construction_work_in_progress", "xbrl_factoid_original": "utility_plant_and_construction_work_in_progress" } }, { "name": "accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility", "weight": -1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": { "calculations": [ { "name": "depreciation_utility_plant_in_service", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "amortization_and_depletion_of_producing_natural_gas_land_and_land_rightsutility_plant_in_service", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "amortization_of_underground_storage_land_and_land_rightsutility_plant_in_service", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "amortization_of_other_utility_plant_utility_plant_in_service", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "depreciation_amortization_and_depletion_utility_plant_in_service", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "depreciation_amortization_and_depletion_utility_plant_leased_to_others", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "depreciation_and_amortization_utility_plant_held_for_future_use", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "abandonment_of_leases", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "amortization_of_plant_acquisition_adjustment", "weight": 1.0, "source_tables": [ "utility_plant_summary_ferc1" ], "calculation": null }, { "name": "accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_correction", "weight": 1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": null } ], "source_table": "balance_sheet_assets_ferc1", "xbrl_factoid": "accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility", "xbrl_factoid_original": "accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility" } }, { "name": "utility_plant_net_correction", "weight": 1.0, "source_tables": [ "balance_sheet_assets_ferc1" ], "calculation": { "calculations": [], "source_table": "balance_sheet_assets_ferc1", "xbrl_factoid": "utility_plant_net_correction", "xbrl_factoid_original": "utility_plant_net" } } ], "source_table": "balance_sheet_assets_ferc1", "xbrl_factoid": "utility_plant_net", "xbrl_factoid_original": "utility_plant_net" } ```
zaneselvans commented 1 year ago

Notes from chat with @cmgosnell:

zaneselvans commented 1 year ago

I've added some code in #2653 that allows the generation of a "leafy" calculation tree for a calculation forest with several different root factoids... using it looks like this right now:

from dagster import AssetKey

from pudl.etl import defs
from pudl.output.ferc1 import MetadataExploder, XbrlCalculationForestFerc1

xbrl_meta = defs.load_asset_value(AssetKey("clean_xbrl_metadata_json"))

balance_sheet_asset_tables = [
    "balance_sheet_assets_ferc1",
    "utility_plant_summary_ferc1",
    "plant_in_service_ferc1",
]

meta_tags = pd.DataFrame(
    columns=["table_name", "xbrl_factoid", "is_ratebase", "utility_function"],
    data=[
        ("utility_plant_summary_ferc1", "depreciation_and_amortization_utility_plant_held_for_future_use", True, "electric"),
    ],
).convert_dtypes().set_index(["table_name", "xbrl_factoid"])

exploded_meta = MetadataExploder(balance_sheet_asset_tables).boom(xbrl_meta)

forest = XbrlCalculationForestFerc1.from_exploded_meta(
    source_tables=["balance_sheet_assets_ferc1", "balance_sheet_assets_ferc1"],
    xbrl_factoids=["utility_plant_net", "deferred_debits"],
    exploded_meta=exploded_meta.set_index(["table_name", "xbrl_factoid"]),
    propagate_weights=True,
    tags_df=meta_tags,
)

leafy_meta = forest.to_leafy_meta()
zaneselvans commented 1 year ago

After re-implementing this to use NetworkX for all the graph stuff... there seem to be some non-tree relationships encoded in the calculations.


all_nodes = list(exploded_meta.set_index(["table_name", "xbrl_factoid"]).index)

new_forest = NewXbrlCalcuationForestFerc1(
    exploded_meta=exploded_meta,
    seeds=all_nodes,
    tags=meta_tags,
)

new_nx_forest = new_forest.nx_forest
new_leafy_meta = new_forest.leafy_meta
new_root_calcs = new_forest.root_calculations

from networkx.drawing.nx_agraph import graphviz_layout
pos = graphviz_layout(new_nx_forest, prog="dot", args='-Grankdir="LR"')
#nx.draw_networkx(new_nx_forest, pos)
nx.draw_networkx_nodes(new_nx_forest, pos)
nx.draw_networkx_edges(new_nx_forest, pos)
plt.show()

image

multi_parents = [n for n, in_deg in new_nx_forest.in_degree() if in_deg > 1]
[NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='depreciation_utility_plant_in_service'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='amortization_and_depletion_of_producing_natural_gas_land_and_land_rightsutility_plant_in_service'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_underground_storage_land_and_land_rightsutility_plant_in_service'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_other_utility_plant_utility_plant_in_service'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='depreciation_amortization_and_depletion_utility_plant_leased_to_others'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='depreciation_and_amortization_utility_plant_held_for_future_use'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='abandonment_of_leases'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='amortization_of_plant_acquisition_adjustment'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='utility_plant'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_in_service_classified_and_unclassified'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_leased_to_others'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_held_for_future_use'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_acquisition_adjustment'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='noncurrent_portion_of_allowances'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='derivative_instrument_assets_long_term'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='derivative_instrument_assets_hedges_long_term')]
zaneselvans commented 1 year ago

Fixing multi-parent nodes

zaneselvans commented 1 year ago

Weirdly, it turns out that using the utility_summary_ferc1 and balance_sheet_assets_ferc1 facts as seeds for the forest results in exactly the same set of nodes:

from dagster import AssetKey

import networkx as nx
import json

from pudl.etl import defs
from pudl.output.ferc1 import MetadataExploder, NodeId, XbrlCalculationForestFerc1

xbrl_meta = defs.load_asset_value(AssetKey("clean_xbrl_metadata_json"))

balance_sheet_asset_tables = [
    "balance_sheet_assets_ferc1",
    "utility_plant_summary_ferc1",
    "plant_in_service_ferc1",
]

# NOTE: there are a bunch of duplicate records in xbrl_factoid_rate_base_tags.csv
pkg_source = (
    importlib.resources.files("pudl.package_data.ferc1")
    .joinpath("xbrl_factoid_rate_base_tags.csv")
)
with importlib.resources.as_file(pkg_source) as tags_csv:
    in_rate_base = (
        pd.read_csv(tags_csv, usecols=["xbrl_factoid", "table_name", "in_rate_base"])
        .drop_duplicates(subset=["table_name", "xbrl_factoid"])
    )

exploded_meta = MetadataExploder(balance_sheet_asset_tables).boom(xbrl_meta)

pis_seeds = list(
    exploded_meta[exploded_meta.table_name == "plant_in_service_ferc1"]
    .set_index(["table_name", "xbrl_factoid"]).index
)
ups_seeds = list(
    exploded_meta[exploded_meta.table_name == "utility_plant_summary_ferc1"]
    .set_index(["table_name", "xbrl_factoid"]).index
)
bsa_seeds = list(
    exploded_meta[exploded_meta.table_name == "balance_sheet_assets_ferc1"]
    .set_index(["table_name", "xbrl_factoid"]).index
)

bs_forest = XbrlCalculationForestFerc1(
    exploded_meta=exploded_meta,
    tags=in_rate_base
)
pis_forest = XbrlCalculationForestFerc1(
    exploded_meta=exploded_meta,
    seeds=pis_seeds,
    tags=in_rate_base
)
bsa_forest = XbrlCalculationForestFerc1(
    exploded_meta=exploded_meta,
    seeds=bsa_seeds,
    tags=in_rate_base
)
ups_forest = XbrlCalculationForestFerc1(
    exploded_meta=exploded_meta,
    seeds=ups_seeds,
    tags=in_rate_base
)

assert ups_forest.nx_forest.nodes == bsa_forest.nx_forest.nodes

There are only 6 nodes with calculations that involve other nodes that show up in more than one calculation, and they are:

[NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='other_property_and_investments'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='current_and_accrued_assets'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='utility_plant_and_construction_work_in_progress'),
 NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility'),
 NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_and_construction_work_in_progress')]

I think the major culprits are the accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and utility_plant_and_construction_work_in_progress factoids, which show up in both balance_sheet_assets_ferc1 and utility_plant_summary_ferc1 and seem to contain entirely duplicated calculations.

Here's a dictionary that maps NodeId to calculations, for the nodes that are "bad parents" (with calculations that involve duplicated facts). I'm not sure if this is the right / enough information to figure out if we can eliminate the duplication with passthrough calculations though. @e-belfer

```py {NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='other_property_and_investments'): [{'name': 'nonutility_property', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'accumulated_provision_for_depreciation_and_amortization_of_nonutility_property', 'weight': -1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'investment_in_associated_companies', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'investment_in_subsidiary_companies', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'noncurrent_portion_of_allowances', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'other_investments', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'sinking_funds', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'depreciation_fund', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'amortization_fund_federal', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'other_special_funds', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'special_funds', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'derivative_instrument_assets_long_term', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'derivative_instrument_assets_hedges_long_term', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'special_funds_all', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'other_property_and_investments_correction', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}], NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility'): [{'name': 'depreciation_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_and_depletion_of_producing_natural_gas_land_and_land_rights_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_of_underground_storage_land_and_land_rights_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_of_other_utility_plant_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'depreciation_amortization_and_depletion_utility_plant_leased_to_others', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'depreciation_and_amortization_utility_plant_held_for_future_use', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'abandonment_of_leases', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_of_plant_acquisition_adjustment', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_correction', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}], NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='current_and_accrued_assets'): [{'name': 'cash_and_working_funds', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'cash', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'special_deposits', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'working_funds', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'temporary_cash_investments', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'notes_receivable', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'customer_accounts_receivable', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'other_accounts_receivable', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'accumulated_provision_for_uncollectible_accounts_credit', 'weight': -1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'notes_receivable_from_associated_companies', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'accounts_receivable_from_associated_companies', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'fuel_stock', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'fuel_stock_expenses_undistributed', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'residuals', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'plant_materials_and_operating_supplies', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'merchandise', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'other_materials_and_supplies', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'nuclear_materials_held_for_sale', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'allowance_inventory_and_withheld', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'noncurrent_portion_of_allowances', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'stores_expense_undistributed', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'gas_stored_current', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'liquefied_natural_gas_stored_and_held_for_processing', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'prepayments', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'advances_for_gas', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'interest_and_dividends_receivable', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'rents_receivable', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'accrued_utility_revenues', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'miscellaneous_current_and_accrued_assets', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'derivative_instrument_assets', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'derivative_instrument_assets_long_term', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'derivative_instrument_assets_hedges', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'derivative_instrument_assets_hedges_long_term', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'current_and_accrued_assets_correction', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}], NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='utility_plant_and_construction_work_in_progress'): [{'name': 'utility_plant', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'utility_plant_in_service_classified_and_unclassified', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_leased_to_others', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_held_for_future_use', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'construction_work_in_progress', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'utility_plant_acquisition_adjustment', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_and_construction_work_in_progress_correction', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}], NodeId(source_table='balance_sheet_assets_ferc1', xbrl_factoid='accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility'): [{'name': 'depreciation_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_and_depletion_of_producing_natural_gas_land_and_land_rights_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_of_underground_storage_land_and_land_rights_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_of_other_utility_plant_utility_plant_in_service', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'depreciation_amortization_and_depletion_utility_plant_leased_to_others', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'depreciation_and_amortization_utility_plant_held_for_future_use', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'abandonment_of_leases', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'amortization_of_plant_acquisition_adjustment', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_correction', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}], NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='utility_plant_and_construction_work_in_progress'): [{'name': 'utility_plant', 'weight': 1.0, 'source_tables': ['balance_sheet_assets_ferc1']}, {'name': 'utility_plant_in_service_classified_and_unclassified', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_leased_to_others', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_held_for_future_use', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'construction_work_in_progress', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_acquisition_adjustment', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}, {'name': 'utility_plant_and_construction_work_in_progress_correction', 'weight': 1.0, 'source_tables': ['utility_plant_summary_ferc1']}]} ```
zaneselvans commented 1 year ago

Maybe passthrough calculations can't fix this problem? Is the real problem that we truly have the same money being reported in two places: the utility_plant_summary_ferc1 table and the balance_sheet_assets_ferc1 table?

I imagine that the utility_plant_summary_ferc1 table is also hooked up to the plant_in_service_ferc1 table with an interdimensional utility_type calculation.

e-belfer commented 1 year ago

I think the major culprits are the accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and utility_plant_and_construction_work_in_progress factoids, which show up in both balance_sheet_assets_ferc1 and utility_plant_summary_ferc1 and seem to contain entirely duplicated calculations.

I'm not 100% following everything above, but I can confirm that both of these factoids are reported to be identical in both places in the yeti metadata. What if we drop the duplicate calculations from the root table and reassign the value from utility plant summary to point at the balance sheet assets one? Is there a way to distinguish between the factoid in its two locations?

E.g.: utility_plant_construction_in_progress (balance sheet assets) = utility_plant_construction_in_progress (utility plant summary) = components.

zaneselvans commented 1 year ago

Ahhhhh, looking at the data tables and the forms, I think I understand more why this is showing in up in two places. Sorry if this was already obvious to you. The utility_plant_summary_ferc1 (UPS) table has another dimension (utility_type) that breaks down all of these accounts into more granular electric, gas, & other categories, while the balance_sheet_assets_ferc1 (BSA) table only has the starting & ending balances for each of the factoids, totaling across all utility types.

So it seems like the redirection that you're suggesting (from BSA to UPS) would preserve the utility-type information that's present in the other dimension, and link the totals in the BSA table to the totals of all utility types in the UPS table, which would be great!

zaneselvans commented 1 year ago

After merging in the interdimensional branch I'm now encountering two new issues with building the calculation trees:

In a couple of cases, there appears to be a conflict between weights for a given calculation, where from one source it's 1.0 and in another it's -1.0.

2023-06-29 11:59:55 [   ERROR] catalystcoop.pudl.output.ferc1:1793 Calculation weights do not match for NodeId(source_table='utility_plant_summary_ferc1', xbrl_factoid='accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility'):1.0 != -1.0
2023-06-29 11:59:55 [   ERROR] catalystcoop.pudl.output.ferc1:1793 Calculation weights do not match for NodeId(source_table='plant_in_service_ferc1', xbrl_factoid='electric_plant_sold'):1.0 != -1.0

Should electric_plant_sold have a weight of -1.0 instead? I guess it shows up in one place with +1.0 and another with -1.0. Maybe the expectation I'm asserting is not correct, but it also wasn't failing before.

There seem to be some new inconsistencies between the exploded metadata and the calculations, including a couple of cases where the unfixed rightsutility string shows up rather than rights_utility and another one with the CWIP:

KeyError: "[
    ('utility_plant_summary_ferc1', 'amortization_of_underground_storage_land_and_land_rightsutility_plant_in_service'),
    ('utility_plant_summary_ferc1', 'amortization_and_depletion_of_producing_natural_gas_land_and_land_rightsutility_plant_in_service'),
    ('utility_plant_summary_ferc1', 'utility_plant_construction_work_in_progress')
] not in index"

It seems like the first two are probably just the rightsutility fix somehow not being propagated into either the calculations or the metadata, but I don't understand why the CWIP factoid would have gone missing. I guess this is the disappearing metadata you were talking about.

cmgosnell commented 1 year ago

closing bc we merged #2653 into explode_ferc1