Closed cmgosnell closed 2 years ago
There are 126 records which have two records per plant that have the same fuel_type_code_pudl
. the original FuelTypeAxis
is unique. but the cleaned codes are not.
because of this, it is breaking the new convention that we can use the Axis
xbrl columns as primary keys in the creation of the record_id
.
We could use some hash or something other than a composite key, but in attempt to preserve the primary key, the idea right now is to condense/aggregate these duplicate records.
there are a slew of records like this:
where one is effectively empty except for one data point.
for these, i built a little function called condense_sets_of_records_with_one_datapoint_in_second_record
(omigosh please halp me rename this)
i'll probably use pudl.helpers.sum_and_weighted_average_agg()
these nuclear records seem.... unsavable. idk what to do with them
I'm calling this issue ready for review, and have created several separate smaller issues pertaining to testing & transform parameter validation, which I'll move on to now @cmgosnell:
With #1903 and #1900 getting merged into xbrl_steam
this issue is done.
Adapt the
fuel_ferc1
transformation process to use the new abstractions developed in #1739, and to accommodate raw inputs from both the old DBF and new XBRL data.DBF specific transforms
normalize_strings
(formerly simplify strings)categorize_strings
(since it has to happen independently for XBRL)XBRL specific transforms
normalize_strings
(formerly simplify strings)Table-specific post-concatentation transforms
nullify_outliers
(formerlyoob_to_nan
)Generic FERC Form 1 final transformations
Must Fix
fuel_units
column to do a first round of unit standardization before attempting to correct units.fuel_units
column to reflect the results of our error corrections and initial units assumptions.energy_sources_eia
table.Other Loose Ends
normalize_strings
functionality fromcategorize_strings
na
category incategorize_strings
more unique.source_ferc1: Literal["dbf", "xbrl"]
with a Pydantic model to simplify error checking everywhere.Ferc1AbstractTableTransformer.table_id
be a valid ferc1 database table name.Issues resulting from or related to this issue
fuel_type_code_pudl
See issue #1344cleanstrings
withcategorize_strings
+normalize_strings
elsewhere in the codebase. See #1770simplify_strings
withnormalize_strings
elsewhere in the codebase. See #17711875
1876
1877
1878
FERC 1 specific questions:
(filing_name, index)
uniquely identify a lot of data. Should we really be dropping them? Would they not be appropriate for therecord_id
values on the XBRL side?Note: the
fuel_ferc1
transform has to be done beforeplants_steam_ferc1
because the algorithmic assignment ofplant_id_ferc1
values depends on fuel information, so this issue is blocking #1707