The MVP infrastructure proposal (#63) for Sec10k extraction designates that we will use Dagster for orchestration. This will ensure the final design conforms to common tooling/design patterns we use in PUDL and elsewhere. This will also prepare the infrastructure here to be further integrated into PUDL in the future. For example, if we create a dagster deployment at some point in the future, it would potentially be able to control extraction from this repo.
Integrating dagster will also help to isolate various parts of the extraction process that don't always need to be run together. We can create a separate dag for training the ex21 model, run extraction on our validation sets and logging validation metrics, and running the full extraction.
Success Criteria
How will we know that we're done?
[ ] Settings handling is improved do CI does not fail intermittently
Overview
The MVP infrastructure proposal (#63) for Sec10k extraction designates that we will use Dagster for orchestration. This will ensure the final design conforms to common tooling/design patterns we use in PUDL and elsewhere. This will also prepare the infrastructure here to be further integrated into PUDL in the future. For example, if we create a dagster deployment at some point in the future, it would potentially be able to control extraction from this repo.
Integrating dagster will also help to isolate various parts of the extraction process that don't always need to be run together. We can create a separate dag for training the ex21 model, run extraction on our validation sets and logging validation metrics, and running the full extraction.
Success Criteria
How will we know that we're done?