-
Our slightly off-label use of `pd.merge_asof()` to merge dataframes of different temporal resolution is slow and has some weird edge cases related to the `tolerance` parameter (how far afield should i…
-
We've created Pydantic `Settings` classes that contain the information required to run the ETL, but all of the lower level extract and transform functions still take lists of individual years, tables,…
-
The data release on Zenodo Archives do not contains data from 2001-2008 for EIA923 and EIA860. Could these earlier years data be added to the Archives?
Many thanks and I appreciate your time.
-
On my 16GB ram mac, data_pipeline without --small crashes. The only error I get is "Killed: 9", indicating out of memory.
This problem appeared after https://github.com/singularity-energy/hourly-e…
-
## Background
* Our early FERC 1 data transformations stored the parameters required for data transformations inside each table specific transformation function.
* This led to poor standardization o…
-
We did a first draft of refactoring our transform functions to separate parameters, processes, and data in #1721, #1722, and #1739 focused just on the FERC Form 1 transforms, and integrating the new X…
-
The `pudl.constants.PUDL_TABLES` dictionary defines what database table names are valid arguments for the ETL process, but at this point that information should be stored elsewhere, either in the Pyda…
-
EIA [released corrections](https://www.eia.gov/electricity/data/eia923/correction.php) to the 2020 eia923 data on October 8th. Our last archive is from September 28th.
-
The `boiler_fuel_eia923` should have a natural primary key of:
* `report_date`
* `plant_id_eia`
* `boiler_id`
* `fuel_type_code`
However, there are 4032 rows out of 1.14 million that ac…
-
The `generation_fuel_eia923` should have a natural primary key of:
* `report_date`
* `plant_id_eia`
* `fuel_type`
* `prime_mover_code`
* `nuclear_unit_id`
However, there are a bunch …