apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.45k stars 14.11k forks source link

Listeners for Datasets #34327

Closed DjVinnii closed 10 months ago

DjVinnii commented 1 year ago

Description

Add listeners for Datasets (events)

Use case/motivation

As Airflow administrators, we would like to trigger some external processes based on all datasets being created/updated by our users. We came across the listeners for the dag runs and task instances (which are also useful), but are still missing listeners for datasets.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

hussein-awala commented 1 year ago

This would be a cool feature, would you like to work on it?

yermalov-here commented 1 year ago

i'd like to try to implement it

yermalov-here commented 1 year ago

Apart from dataset creation and dataset updates would it also be helpful to introduce hooks for dataset orphaning and unorphaning?

DjVinnii commented 1 year ago

Apart from dataset creation and dataset updates would it also be helpful to introduce hooks for dataset orphaning and unorphaning?

In our use-case we only have a purpose for the dataset creation and dataset updates. However, I could see that others also are interested in the orphaning and unorphaning. What's your take on it @hussein-awala?

cmarteepants commented 9 months ago

@DjVinnii, for docs/release notes purposes, high-level what is your use-case for this? Like what kind of external processes could someone trigger on dataset events?

DjVinnii commented 9 months ago

@cmarteepants

Like what kind of external processes could someone trigger on dataset events?

We are going to use the dataset listeners in order to synchronize datasets across regional Airflow instances. Unfortunately there is no API endpoint yet to create/update Datasets so we still need to do some tricky workarounds.