Open zaneselvans opened 6 months ago
@catalyst-cooperative/com-dev Is this still open? I'd like to work on this as a first time contributor
Hey there! Yes, this is still open. I was thinking about this as a good one after getting your office hours signup. There are lots of other examples of asset factories floating around that you could use as a guide. If you have a chance to get the PUDL / Dagster local development environment running, this should be a pretty easy thing to test out locally.
The top portion of the
pudl.output.ferc1
module contains a number of individual asset definitions for denormalized / output tables with very similar structures, which could be consolidated into a small number of asset factories using the pattern adopted in e.g.pudl.extract.ferc714
(after PR #3123). See Dagster's blog post Factory Patterns in Python for some more background on the factory design pattern, and its application to Dagster assets.Note that the calls to
pudl.helpers.organize_cols()
found in the current FERC 1 output asset definitions are no longer required, as the ordering of columns in the database is determined by the resource definitions / database schema now. These calls are leftover from when we were producing dataframes for users on request rather than writing these tables to the database.Note that some of these assets currently create new columns containing derived values, and those would need to be preserved, either with their own asset definitions, or some way of keeping track of which calculations should be done for what tables inside the asset factory.