catalyst-cooperative / mozilla-sec-eia

Exploratory development for SEC to EIA linkage
MIT License
0 stars 0 forks source link

Integrate Dagster to basic 10k/ex21 extraction #66

Closed zschira closed 1 month ago

zschira commented 2 months ago

Overview

The MVP infrastructure proposal (#63) for Sec10k extraction designates that we will use Dagster for orchestration. This will ensure the final design conforms to common tooling/design patterns we use in PUDL and elsewhere. This will also prepare the infrastructure here to be further integrated into PUDL in the future. For example, if we create a dagster deployment at some point in the future, it would potentially be able to control extraction from this repo.

Integrating dagster will also help to isolate various parts of the extraction process that don't always need to be run together. We can create a separate dag for training the ex21 model, run extraction on our validation sets and logging validation metrics, and running the full extraction.

Success Criteria

How will we know that we're done?

jdangerx commented 2 months ago

If this starts to balloon, we could split this into one separate ticket for each top-level success criterion you've identified.