The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
The extraction portion of the process will include a module for each datasource. This will include the current datasource.py modules, which pull from the datasource to create the raw dataframes. The extraction modules will now also include the first steps of the current ingest functions for each table (i.e. section of the applicable dataframe and discarding the columns that aren't needed). Each table specific extraction function should output a dataframe.
End result:
a folder in pudl named 'extract' with modules for each data source (named for their datasource i.e. pudl.extract.eia860.py)
Each module will have:
the datasource specific preparation (i.e. compilation of the dictionary of dataframes from eia tabs)
a function for each individual table which pulls the relevant dataframe (i.e. the tab from the relevant eia excel file) and drops any irrelevant columns.
a function which compiles the dataframes from the individual table functions into a dictionary.
We've decided to only include the first two bullets from the end results above in this section. Any dropping of columns will be moved into the transform section.
The extraction portion of the process will include a module for each datasource. This will include the current datasource.py modules, which pull from the datasource to create the raw dataframes. The extraction modules will now also include the first steps of the current ingest functions for each table (i.e. section of the applicable dataframe and discarding the columns that aren't needed). Each table specific extraction function should output a dataframe.
End result: