astronomer / dag-factory

Dynamically generate Apache Airflow DAGs from YAML configuration files
Apache License 2.0
1.16k stars 176 forks source link

Simplify the DAG-instantiation process #213

Open jroach-astronomer opened 1 month ago

jroach-astronomer commented 1 month ago

Currently, once a .yml file has been created, the following code (or something like this) will have to be created to "instantiate" the DAG.

from dagfactory import DagFactory

# Pass in an exact config file name
dag_factory: DagFactory = DagFactory("/usr/local/airflow/dags/config/example_on_failure_callback.yml")

# Clean and generate DAGs
dag_factory.clean_dags(globals())
dag_factory.generate_dags(globals())

I think it would be more intuitive to do something like what is shown below. I know this is somewhat pedantic, and feel free to close this issue if so.

from dagfactory import DagFactory

# Pass in an exact config file name
dag_factory: DagFactory = DagFactory("/usr/local/airflow/dags/config/example_on_failure_callback.yml")
dag_factory.generate_dags()
cmarteepants commented 1 month ago

Yes! I agree - it feels like a lot of boilerplate as is, and IMO it's borderline dangerous if you forgot to clean the dags. Haven't tested but I can see there being a lot of confusion if, when hosting Airflow in K8s and using git-sync, a config file is deleted. The dag will still be there and will most likely be able to run. Then, if the dag processor container restarts (likely on k8s), the dag will "all of a sudden" go missing.

@jroach-astronomer - is there any reason you can think of where you wouldn't want to remove no longer defined dags?

jroach-astronomer commented 1 month ago

Not one that I can think of, I'm right there with you, I think it would be dangerous to allow for no-longer-defined DAGs to exist.