The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
We have several different settings / configuration file types that we are using to control our scripts and automated processes, but nowhere do we have any documentation of what all those file types are, what they're used for, what they are allowed to have in them, what they must have in them, whether they can be merged together and used for more than one kind of script, etc.
We need to explicitly define and document what these different input files look like. It seems like they at least include files that control:
[x] pudl_etl which takes in raw input data and outputs bundles of datapackages.
[x] ferc1_to_sqlite which takes in raw FERC 1 data and outputs an SQLite database.
[ ] epacems_to_parquet which takes in EPA CEMS CSV files and outputs Apache Parquet files.
[ ] pudl_data which creates a datastore (currently only command line args?)
In addition, there's the .pudl.yml file that defines a user's default workspace.
We have several different settings / configuration file types that we are using to control our scripts and automated processes, but nowhere do we have any documentation of what all those file types are, what they're used for, what they are allowed to have in them, what they must have in them, whether they can be merged together and used for more than one kind of script, etc.
We need to explicitly define and document what these different input files look like. It seems like they at least include files that control:
pudl_etl
which takes in raw input data and outputs bundles of datapackages.ferc1_to_sqlite
which takes in raw FERC 1 data and outputs an SQLite database.epacems_to_parquet
which takes in EPA CEMS CSV files and outputs Apache Parquet files.pudl_data
which creates a datastore (currently only command line args?)In addition, there's the
.pudl.yml
file that defines a user's default workspace.