apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.44k stars 14.11k forks source link

Logging handler TaskHandlerWithCustomFormatter adds prefix three times #35622

Closed Aakcht closed 4 months ago

Aakcht commented 10 months ago

Apache Airflow version

2.7.3

What happened

I tried using airflow.utils.log.task_handler_with_custom_formatter.TaskHandlerWithCustomFormatter handler for task logging and noticed that the prefix specified in task_log_prefix_template is added three times.

What you think should happen instead

The prefix should be added only one time.

How to reproduce

At first I encountered this issue in Kubernetes deployment with custom logging configuration, but it is also reproduced locally with the default configuration using airflow standalone command:

1) Copy logging template from https://github.com/apache/airflow/blob/main/airflow/config_templates/airflow_local_settings.py and place it to $AIRFLOW_HOME/config/custom_logging_config.py

2) In the logging template change task handler to

        "task": {
            "class": "airflow.utils.log.task_handler_with_custom_formatter.TaskHandlerWithCustomFormatter",
            "formatter": "airflow",
            "filters": ["mask_secrets"],
            "stream": "ext://sys.stdout",
        },

3) In airflow.cfg set:

[logging]
...
logging_config_class = custom_logging_config.DEFAULT_LOGGING_CONFIG
...
task_log_prefix_template = {{ ti.dag_id }}-{{ ti.task_id }}-{{ execution_date }}-{{ try_number }}

4) Run airflow standalone and start any DAG in the UI.

5) Check task logs in the terminal, for example:

scheduler  | example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:[2023-11-14 10:19:33,087] {taskinstance.py:1159} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: example_xcom.puller manual__2023-11-14T07:19:15.633728+00:00 [queued]>
scheduler  | example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:[2023-11-14 10:19:33,087] {taskinstance.py:1361} INFO - Starting attempt 1 of 1

Operating System

Debian "11 (Bullseye)" or Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

Can be reproduced both locally and using airflow helm chart.

Anything else

I found this issue https://github.com/apache/airflow/issues/10202 , but it looks like it was about airflow 1.

I wouldn't mind submitting a PR for this issue, but I'm not sure how to fix it except for adding a check in https://github.com/apache/airflow/blob/main/airflow/utils/log/task_handler_with_custom_formatter.py#L37 if the context was already set. But this doesn't look like correct approach.

Are you willing to submit PR?

Code of Conduct

Taragolis commented 7 months ago

@Aakcht Sorry for the late response. Is it still a case now, maybe you found a solution? Asked because that I do not know that we have this task handler, and only one reference into the documentation about configuration prefix, and that is all: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#task-log-prefix-template 😢

As I could see this handler initially added as part of Airflow Contrib (deprecated in Airflow 2.0)

And after migration to the Airflow 2.0 it moved into the core: https://github.com/apache/airflow/pull/9395

Unfortunetly after that I can't find any significant changes in it so this might be a signal that it is broken for a long time or even abandoned

Aakcht commented 7 months ago

Hi, @Taragolis ! Just retested it on Airflow 2.8.1 version - the issue is still there. Haven't gotten around to trying to figure out what the problem is yet , so no solution for now 🙁. I guess the idea from my original comment about checking if a context is already set in TaskHandlerWithCustomFormatter.set_context method would work, but this approach doesn't look correct .

Taragolis commented 7 months ago

The most I can do now is mark it as Good First Issue, and some one (maybe you?) could invest their time a investigate is possible to fix that, and if so it would be nice also create some documentation about this handler.

TiDeane commented 6 months ago

Hello, can I be assigned to this?

I've replicated the bug in version 2.8.2, and @Aakcht's suggestion of checking if the context was already set in the TaskHandlerWithCustomFormatter.set_context method really did fix it, but I'd like to take some time to try and find out why it's being called multiple times in the first place and fix it that way.

brki commented 6 months ago

@TiDeane If you're working on this, perhaps you can make a related doc fix too? I noticed that the example given for https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#task-log-prefix-template does not work.

Given example is:

{ti.dag_id}-{ti.task_id}-{execution_date}-{try_number}

That doesn't work (it results in no prefix being added). Corrected, it would be:

{{ti.dag_id}}-{{ti.task_id}}-{{execution_date}}-{{ti.try_number}}
TiDeane commented 6 months ago

@brki I am, and thank you for the suggestion! I'll update the docs along with the bugfix.

A little progress update for anyone interested: from the testing I've done, it seems like the bug is caused because TaskHandlerWithCustomFormatter's set_context is called multiple times from logging_mixin’s set_context. Also, I'm not sure if it's intentional, but some TaskHandlerWithCustomFormatter's have their Formatter as a TimezoneAware object and each time they still add another prefix. The number of prefixes increases with each call, but caps at 3 (subsequent calls don't add more prefixes).

It might just be that the simplest and most efficient solution would be to simply check if the context has already been set at the beginning of TaskHandlerWithCustomFormatter's set_context, like was initially suggested. Any alternatives I can think of would either require changing the handlers' structure or slow down execution for every other handler.

By adding a simple if self.prefix_jinja_template is not None: return at the beginning of the TaskHandlerWithCustomFormatter's set_context method the bug is completely fixed, and something similar is done by the ElasticSearch es_task_handler as well: https://github.com/apache/airflow/blob/main/airflow/providers/elasticsearch/log/es_task_handler.py#L443.