Open aesharpe opened 1 year ago
Logging that is done by the function can't know what class it's inside of / table it's operating on, since there is no self
to refer to.
Logging that is done in the class method that wraps the function won't have the context required to report transform-specific changes / progress / status.
It seems like we should have:
I'm suspicious that there is not a general way to do the second conditional you're suggesting though. Once params
has been set:
params = self.params.drop_duplicate_rows_dbf
How do you determine whether a substantive transformation is going to take place or not, based on the value of the params? It seems like that will be different for every parameter class.
I'm suspicious that there is not a general way to do the second conditional you're suggesting though. Once params has been set
What would be the reasoning against doing something like:
def drop_duplicate_rows_dbf(
self, df: pd.DataFrame, params: DropDuplicateRowsDbf | None = None
) -> pd.DataFrame:
if params is None:
params = self.params.drop_duplicate_rows_dbf
if params:
logger.info(
f"{self.table_id.value}: Dropping rows where primary key and "
"data columns are duplicated."
)
df = drop_duplicate_rows_dbf(df, params=params)
return df
Right now it says if params.table_name:
but I think what's important is whether or not there are any params to begin with because that's what determines whether the function runs. Especially when coupled with #2160 about flagging invalid params.
How do you determine whether a substantive transformation is going to take place or not, based on the value of the params? It seems like that will be different for every parameter class.
I think the way to approach this is to include the function-specific logging messages that don't include self.table_id.value
so for drop duplicate rows, just make sure it tells you how many rows it's dropping (even if it's 0). I think it might already do that, actually (some of the functions already have these implemented). I mostly just want to know if the function actually ran at all vs. we asked it to run, found out there weren't any params, and then didn't run it at all.
I think the problem here is that params
will always be set when you get to the second if statement. Either was set at the beginning, or it has been set by in the assignment within the first if statement.
self.params
is a Ferc1TableTransformParams
object, which is composed of many other TransformParams
objects, some of which are specific to FERC Form 1, and some of which are more general and inherited from the parente TableTransformParams
class. Here's the definition:
class Ferc1TableTransformParams(TableTransformParams):
"""A model defining what TransformParams are allowed for FERC Form 1.
This adds additional parameter models beyond the ones inherited from the
:class:`pudl.transform.classes.AbstractTableTransformer` class.
"""
class Config:
"""Only allow the known table transform params."""
extra = "forbid"
rename_columns_ferc1: RenameColumnsFerc1 = RenameColumnsFerc1(
dbf=RenameColumns(),
xbrl=RenameColumns(),
xbrl_instant=RenameColumns(),
xbrl_duration=RenameColumns(),
)
wide_to_tidy: WideToTidySourceFerc1 = WideToTidySourceFerc1(
dbf=WideToTidy(), xbrl=WideToTidy()
)
merge_xbrl_metadata: MergeXbrlMetadata = MergeXbrlMetadata()
align_row_numbers_dbf: AlignRowNumbersDbf = AlignRowNumbersDbf()
drop_duplicate_rows_dbf: DropDuplicateRowsDbf = DropDuplicateRowsDbf()
unstack_balances_to_report_year_instant_xbrl: UnstackBalancesToReportYearInstantXbrl = (
UnstackBalancesToReportYearInstantXbrl()
)
All of the attributes have default values of the appropriate type, including drop_duplicate_rows_dbf
so even if the parameter dictionary in pudl.transform.params.ferc1
for the table being transformed does not provide values for drop_duplicate_rows_dbf
, when the Ferc1TableTransformParams
class is instantiated, that attribute will be set to the default value. E.g.
f1ttfp = pudl.transform.ferc1.Ferc1TableTransformParams()
if f1ttfp.drop_duplicate_rows_dbf:
print("See, it's got a value")
The transform process uses classes, such as
AbstractTableTransformer
andFerc1AbstractTableTransformer
, coupled with a standard set of method calls to modify tables. Whether the methods alter the table in question depends on some default parameter settings and the parameters specified inparams.py
.Right now, the generic transformation functions defined in
classes.py
look like this:Whereas the ones in the
transform/ferc1.py
module look like this:The extra conditional before the logging output ensures that the logger is only printed if there are indeed valid parameters for that function. This is good because it avoid confusion as to whether the transformation was applied to a given table or not.
Other things to consider:
if params is None
like if != None??if params
conditional is called, whether that's in the top-level function definition or in the class-level function definition.if not params
that should beif params is None
for consistency.