apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.1k stars 14.29k forks source link

Adding tags to DAG runs #43472

Open asaff1 opened 2 weeks ago

asaff1 commented 2 weeks ago

Description

Currently I see only option to tag DAGs. I don't see such an option for DAG runs. It will be quite useful for me, to add and filter runs by tags.

Use case/motivation

I'd like to filter runs based on tags. For example, I have different DAGs (predict, train, etc.) that I could tag with tags like 'model_name=mymodel', then I can see all runs that used that model. This also gives the user a way to group dagruns based on keys like experiment=name, such management exists in other dag runners (e.g kubeflow)

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

boring-cyborg[bot] commented 2 weeks ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

potiuk commented 2 weeks ago

Yep. Would be a great feature to have. Maybe someone will pick it up and implement (but FYI. The fastest way to get something like tha timplemented, is to contribute it, second fastest is to find someone who will contribute it - and for example pay them to do so - other than that someone will have to volunteer and implement it - and since this is an open-source project, it's on a discretion of whoever signs up and take that one.

asaff1 commented 2 weeks ago

@potiuk I might be able to contribute it. I can submit a PR. Any guidelines for that?

As I see what needs to be done, is to add a new DB model DagRunTag with key, value fields (in models/dagrun.py), then adding a many to many relationship between DagRun and DagRunTag. Then I need to add CRUD operations to the API, and finally for the UI I need to add a filtering / search by tags. Is that a good direction?

potiuk commented 2 weeks ago

Yep. And it could be done in stages as well You could even add the list of things to do in the description of this issue with

And check them one-by-one while doing it.

Also I think that one needs a say from @bbovenzi and @pierrejeambrun from the UI perspective, because if we implement something like that, it needs to someohow fit into navigation patterns in the new UI

(Note that this might be Airflow 3 - only change anyway)

pierrejeambrun commented 2 weeks ago

Interesting. I think we can make that work in the UI in terms of display and filtering in the DagRun table.

The thing I am wondering is how and when will tags be added to a specific dagrun ? For dags it is a dag authoring time, which makes sense, for dag_runs it is less clear to me. Will a user need to do that manually, or do the runs inherit the DAG tags by default, etc, etc

asaff1 commented 2 weeks ago

@pierrejeambrun For my use case it is beneficial enough to set tags at creation time, or manually after creation (from the UI / API).

bbovenzi commented 2 weeks ago

First, before we add a new table. Let's see if we can use or improve existing fields to achieve this.

pierrejeambrun commented 2 weeks ago

For my use case it is beneficial enough to set tags at creation time, or manually after creation (from the UI / API).

Set tags at creation time what do you mean by that ?

asaff1 commented 2 weeks ago

@pierrejeambrun I mean, setting tags when manually triggering DAGs via the UI, or by the REST API. (So I mean, at the creation time of dag runs.)