catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Add dimensions and tabular calculations to XBRL Calculation Forests #2736

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

With #2721 merged, the additional dimensions utility_type, plant_status, and plant_function are available in the newly compiled tabular calculation components table and in the exploded dataframes.

In order to validate and debug the calculations that involve these dimensions, and use them in analysis of the exploded data, they need to be integrated into the calculation forests, and the leaf-based filtering of the exploded data.

Rather than identifying a calculation component by (table_name, xbrl_factoid) we'll now need to use a tuple that also includes all of the other dimension columns. In many cases these will include null values, since not all dimensions are involved in all calculations. This larger tuple will also be used to filter the final exploded data table.

- [x] Use calc components table to construct calculation tree.
- [x] Use additional dimensions to filter the exploded dataframe.
zaneselvans commented 1 year ago

Questions

cmgosnell commented 1 year ago
zaneselvans commented 1 year ago

I'm struggling with the new calculation_components_xbrl_ferc1 association table. The MetadataExploder.calculations() method needed to be updated to appropriately manage calculations that refer to components from outside of the explosion (turning them into parent-only calculation components) but either the inputs don't look like I think they do, or I'm doing something wrong in the new method, or my expectations about what should be true about the dataframe that's coming out of the method are wrong. See #2763