apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.41k stars 14.36k forks source link

Incorrect Rendering of Error for DBT Output in Airflow Logs #43465

Open GabrielEisenbergOlympus opened 1 month ago

GabrielEisenbergOlympus commented 1 month ago

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.10.1

What happened?

A dbt run concluded successfully and the logs rendered as if there was an error: image

This likely happened because of the word "ERROR" in the logs.

What you think should happen instead?

The log should not be rendered as if there was no error

How to reproduce

Run this code on an airflow instance (in my case I used astro to reproduce the error seen in MWAA):

from airflow.decorators import dag, task
from datetime import datetime
import logging

logger = logging.getLogger(__name__)

# Define the basic parameters of the DAG, like schedule and start_date
@dag(
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    catchup=False,
)
def error_render():
    @task(task_id="render_error")
    def log_error():
        logger.info("Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1")

    log_error()

error_render()

I obtained the following output:

image

Operating System

Amazon Linux 2023

Versions of Apache Airflow Providers

No response

Deployment

Amazon (AWS) MWAA

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 month ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

potiuk commented 1 month ago

seems like coloring is taken from the ERROR specified in the message, nice good first issue for someone to take a look at.

mithindev commented 4 weeks ago

Hi @potiuk, I’d like to work on it.

mithindev commented 4 weeks ago

Error Reproduction

I successfully reproduced the error:

Error Screenshot

I am currently working on a solution.

mithindev commented 3 weeks ago

Hello @nathadfield,

I wanted to get a bit more context regarding the expected outcome for this task. Based on my current investigation, it appears that the coloring is applied by matching specific keywords like “error,” “warn,” or “exception.” However, this approach seems limited, as it doesn’t account for other relevant keywords, such as “Failed,” which might also warrant a red color.

Could you clarify if the goal is to apply color based on log states rather than just matching specific words? Should I proceed with an approach that aligns coloring with log states or levels instead?

jscheffl commented 3 weeks ago

This is a side effect of a "feature" being introduced in Airflow 2.10.0 All lines are colored based on configurable keywords. Keywords can be configured via https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#color-log-error-keywords and https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#color-log-warning-keywords

So this is not an issue, except if you dislike the feature and want to revert it. Otherwise I'd propose to adjust te keywords if it botheres you. (I don't like it for most 80% of cases but acknowledge for some it is useful).

Therefore... it is not a bug.

For reference it was listed in the release notes in https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#new-features - The PR which added the feature was https://github.com/apache/airflow/pull/39006

GabrielEisenbergOlympus commented 2 weeks ago

Hi @jscheffl, perhaps there could be a little more nuance as to how errors etc are highlighted. This could get rather complex, but I'll suggest it anyway.

You could generate error highlighting based on the underlying technologies being used in a task. In that way, you could have custom highlighting that gets defined at a task level.

In this case, dbt outputs the outcome in a consistent manner as shown above. So perhaps the highlighting for the task could be tuned based on custom regex. In this case, the output line could be ignored in highlighting. Is this overkill for something simple? Probably.

Do with this what you will 😆

jscheffl commented 1 week ago

Hi @jscheffl, perhaps there could be a little more nuance as to how errors etc are highlighted. This could get rather complex, but I'll suggest it anyway.

Yes, this is what I feared. Also I was not a big fan of the feature first-place.

So what I wanted to say is: It will be complex. Or at least medium-complex. As logs are rendered on the client side and can also be a couple of megabytes which are re-loaded time-over time it need to be implemented in a way not crashing browser by overload. And it need to be a bit more clever than the current mechanism. Or the current mechanism needs to be reverted... which can also be that you just change your configuration to disable actually if it bothers you.

Note that the UI is in complete rework, it is planned to re-implement it fully with Airflow 3. So an improvement will most likely only be relevant for Airflow 2.10/2.11 line.

Looking forward for your proposal!