astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
515 stars 131 forks source link

Emit Airflow metrics to support analysing Cosmos performance #991

Open tatiana opened 1 month ago

tatiana commented 1 month ago

Context

We want more visibility on how much Cosmos spends while parsing the dbt project and building the Airflow DAG.

We'd like to leverage Airflow Metrics collection system by using:

Stats.timer("ol.emit.attempts")

To collect the following metrics:

Relevant parts of the code:

https://github.com/astronomer/astronomer-cosmos/blob/cda2a5058bb3c95f1c2e1b9a5352f8ceb7b22f6a/cosmos/dbt/graph.py#L168-L171

https://github.com/astronomer/astronomer-cosmos/blob/cda2a5058bb3c95f1c2e1b9a5352f8ceb7b22f6a/cosmos/airflow/graph.py#L215

https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/airflow/dag.py https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/airflow/task_group.py

Acceptance criteria

dwreeves commented 1 month ago

A few questions:

tatiana commented 1 month ago

Hey, @dwreeves, these are very valid points.

I'm improving the logs on a per DAG/TaskGroup as part of #1014 (e.g., https://github.com/astronomer/astronomer-cosmos/pull/1014/files#diff-61b585fb903927b6868b9626c95e0ec47e3818eb477d795ebd13b0276d4fd76cR293). This will probably be switched to DEBUG and be further improved, but this would help to address the granularity your suggestion. I'll probably create a PR only for this :)

The goal with having the metrics proposed in this PR is to really have a "group" that helps to have an overview of the health of these numbers across multiple DAGs - and help spot overall if any of these metrics are looking more troublesome than others. WDYT?