Open e-belfer opened 3 months ago
Bonus points on this issue if you can figure out a way to dramatically reduce the memory intensity of the EPA CEMS CSV extraction. It's currently a huge bottleneck, and means we can only process 2 EPA CEMS assets at a time, which ends up being the thing that controls how long the overall ETL takes.
Is your feature request related to a problem? Please describe. In #3402 we implemented a new
CsvExtractor
class inpudl.extract.csv
, subclassing a genericExtractor
. We should update both of our existing extractors to use this new format.Describe the solution you'd like For
pudl.extract.ferc714
andpudl.extract.epacems
, we should transition to subclassing the newCsvExtractor
class instead of using ad-hoc functions. These should result in the same exact outputs, but use the newCsvExtractor
infrastructure.Describe alternatives you've considered Retain existing bespoke extractors.