getsentry / sentry-python

The official Python SDK for Sentry.io
https://sentry.io/for/python/
MIT License
1.8k stars 475 forks source link

sentry + airflow performance monitoring #1446

Open voltcode opened 2 years ago

voltcode commented 2 years ago

Problem Statement

I configured TRACES_SAMPLE_RATE env var for airflow, and can see basic airflow performance data in sentry

However, I do not see performance data for my DAGs - their performance is much more important to me than core airflow's.

I would like to analyze performance of my DAGs and their Tasks in Sentry.

Solution Brainstorm

Perhaps the solution building blocks are already there but the documentation is lacking? Maybe it's just a matter of extending the documentation for airflow integration on how to extend a DAG with sentry configuration? If so, then I would appreciate a code sample, maybe hello world extended with sentry code configured to generate traces for DAGs and their Tasks.

If not, maybe there needs to be an airflow plugin built to make it easy for DAG maintainers to plug-in observability.

antonpirker commented 2 years ago

Hey @voltcode ! Thanks for bringing this up!

It seems Sentry does not yet add performance data of DAGs in Airflow automatically.

If you could put together (or point us to) a simple sample project of an Airflow project with DAGs we can have a look.

In the mean time, you could add custom instrumentation to your DAGs so you have the performance data you want in Sentry. See the documentation for this:

https://docs.sentry.io/platforms/python/guides/celery/performance/instrumentation/custom-instrumentation/

voltcode commented 2 years ago

Hey, @antonpirker I'd say the tutorial DAG would be a good starting point: https://github.com/apache/airflow/blob/main/airflow/example_dags/example_bash_operator.py esp. that it is well documented and explained step by step here (which would be beneficial for future sentry+airflow users): https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html

now the effect I'd like to see is to see the DAG as a transaction and its tasks (t1, t2, t3) as its children.

I've already looked at the custom-instrumentation link you provided, but I don't understand how to move from there to what I explained as the expected result - how to wrap(? ) the airflow tasks and DAG into sentry entities so that it is visible? or maybe there's some python injection mechanism that could be used?

What I currently see that is closes to what I want to see in Performance section in Sentry is image

I tried adding a transaction manually in DAG code, but I have doubts whether it profiles the "setup" of the dag or its actual execution.

antonpirker commented 2 years ago

Thanks for the information. We will put this in our internal backlog at low priority. So I can not promise any date.

We are always happy to accept pull requests if you want to give it a go. I can help you with reviews and such to bring the pull requests into a releasable shape!

To anybody else, please "thumb up" the issue if you want to see this implemented, so we can measure demand for it. Oh, and you are also very well invited to submit a pull request for this!