Open wearpants opened 4 days ago
Something like this?
./dags/default.yml
default:
catchup: false
default_args:
start_date: "2024-01-01"
schedule_interval: "0 0 * * *"
tasks:
extract:
operator: airflow.operators.python.PythonOperator
python_callable_file: /usr/local/airflow/include/etl_helpers.py
python_callable_name: extract_helper
load:
dependencies:
- transform
operator: airflow.operators.python.PythonOperator
python_callable_file: /usr/local/airflow/include/etl_helpers.py
python_callable_name: load_helper
transform:
dependencies:
- extract
op_kwargs:
ds_nodash: '{{ds_nodash}}'
operator: airflow.operators.python.PythonOperator
python_callable_file: /usr/local/airflow/include/etl_helpers.py
python_callable_name: transform_helper
./dags/bi.yml
business_analytics:
schedule_interval: "@daily"
tasks:
load:
op_kwargs:
database_name: BA
table_name: inventory
./dags/ds.yml
data_science:
tasks:
load:
op_kwargs:
database_name: DS
table_name: daily_sales
./dags/ml.yml
machine_learning:
tasks:
load:
op_kwargs:
database_name: ML
table_name: training_data
...
@cmarteepants, only thing I think I'd add here is referencing the default values in .dags/bi.yml
, etc.
@cmarteepants So basically bi.yml
etc are merged on top of default.yml
? Could you clarify how that works - does that happen for the entire yaml object tree key-by-key / lists extended / etc? How would you do overrrides? (Take a look at ChainMap for a simple comparison).
I had been mainly thinking of this only for default_args
and that defaults.yml
wouldn't provide any tasks (could use cross-dag dependencies for that)... but if defaults.yml
is more like a template / base class that can be extended/overriden, that opens up some interesting possibilities, but not totally clear how that would work.
Docker compose does something similar, but the merge rules are kind of adhoc-yet-sensible
@wearpants If everything is contained in the same yaml today, yes anything in default
is more like a template that can be extended AND overridden.
As for the how? I'll be honest: I haven't delved much into the source code to understand how this was implemented. Could be something we are getting "for free" from pyyaml, but never looked into it as the capability was around from before Astronomer took over the project. I opened up issue #295 so we can we document this properly. The examples in the issue are for extending, overriding and even generating the exact same dag structure with different task ids, and they all work.
I really like your idea about splitting up the definitions into different files though, and allowing for different defaults per folders. I'd even go so far as push that as a best practice. We'd need to allow for an order of precedence, but assuming we can pull it off (and I don't see why not, but I'm the PM :D) I agree, I think it would be really powerful.
I'll have someone on the engineering team start looking into this within the next few sprints. Do you want to be kept to update as things progress?
@cmarteepants yes please keep me in the loop, happy to hop on a quick design brainstorming session call as well if that be helpful
Description
It'd be nice to pass some shared
default_args
for a directory, either via a python object or adefaults.yml
file in the directory.Use case/motivation
One DAG per file is easier for users IMO, and as a system administrator I'd like to be able to give them a shared set of pre-baked defaults (env vars, etc.)
Related issues
289, #290
Are you willing to submit a PR?