astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
596 stars 152 forks source link

Add back newlines into log format #687

Closed joppevos closed 7 months ago

joppevos commented 10 months ago

The stderror from dbt is displayed on a single line in Airflow logs. This makes it difficult to read what is going on. We get back the new lines to display it just as DBT does on CLI.

This is part of the log that is outputted on the single line: airflow.exceptions.AirflowException: ('dbt command failed. The command returned a non-zero exit code 1. Details: ', '\x1b[0m07:55:11 Running with dbt=1.5.0', '\x1b[0m07:55:14 Unable to do partial parsing because saved manifest not found. Starting full parse.', '\x1b[0m07:55:18 Found 3 models, 25 tests, 0 snapshots, 0 analyses, 596 macros, 0 operations, 0 seed files, 2 sources, 0 exposures, 0 metrics, 0 groups', '\x1b[0m07:55:18', "\x1b[0m07:55:18 Concurrency: 3 threads (target='prod')", '\x1b[0m07:55:18',

It looks like we can use \x1b or the comma's to get back the new line structure.

Screenshot of Airflow

image

I am happy to give it a shot

dosubot[bot] commented 10 months ago

🤖

Hello @joppevos,

I'm Dosu, a bot designed to assist with issues and questions related to the Cosmos repository. I can help you troubleshoot bugs, answer queries, and guide you on becoming a contributor. While we wait for a human maintainer, I'm here to provide immediate support.

I see you've posted an issue, and I'm currently reviewing it. I'll provide you with a thorough response soon. Hang tight!

dosubot[bot] commented 10 months ago

🤖

Hello @joppevos!

I'm Dosu, a bot here to assist you with your issues, answer your questions, and guide you on your journey to becoming a contributor. While we wait for a human maintainer, please feel free to ask anything you need help with. I'm here to make your experience smoother!

Thank you for bringing this to our attention. The issue you're experiencing is due to the way the logs are currently formatted in the Cosmos codebase. The logs are formatted using Python's built-in logging module and Airflow's CustomTTYColoredFormatter, which doesn't preserve the newline characters from the original dbt output.

To address this, you could modify the get_logger function in the cosmos/log.py file to include newline characters in the log format. However, this might not be the best solution as it could potentially affect other parts of the logging system.

A better approach might be to modify the parse_output and extract_log_issues functions in the cosmos/dbt/parser/output.py file to preserve the newline characters from the dbt output. These functions are responsible for parsing the dbt output and extracting relevant information. By modifying these functions, you can ensure that the newline characters are preserved only in the dbt output and not in other parts of the logs.

Here's a rough idea of how you could modify the parse_output function:

def parse_output(result: FullOutputSubprocessResult, keyword: str) -> int:
    output = result.output
    try:
        num = int(output.split(f"{keyword}=")[1].split()[0])
    except ValueError:
        logging.error(
            f"Could not parse number of {keyword}s. Check your dbt/airflow version or if --quiet is not being used"
        )
    return output.replace('\x1b[0m', '\n')

And the extract_log_issues function:

def extract_log_issues(log_list: List[str]) -> Tuple[List[str], List[str]]:
    test_names = []
    test_results = []
    pattern1 = re.compile(r"\d{2}:\d{2}:\d{2}\s+Warning in test ([\w_]+).*")
    pattern2 = re.compile(r"\d{2}:\d{2}:\d{2}\s+(.*)")

    for line_index, line in enumerate(reversed(log_list)):
        cleaned_line = clean_line(line)

        if "Finished running" in cleaned_line:
            break

        if "Warning in test" in cleaned_line:
            test_name = pattern1.sub(r"\1", cleaned_line)
            test_result = pattern2.sub(r"\1", clean_line(log_list[-(line_index + 1) + 1]))

            test_names.append(test_name)
            test_results.append(test_result.replace('\x1b[0m', '\n'))

    return test_names, test_results

Please note that this is just a rough idea and the actual implementation might be different. You would need to test these changes thoroughly to ensure they don't introduce any new issues.

I hope this helps! If you have any further questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

tatiana commented 10 months ago

Hi @joppevos , thanks for raising this. We'd appreciate your contribution!

dosubot[bot] commented 7 months ago

Hi, @joppevos,

I'm helping the Cosmos team manage their backlog and am marking this issue as stale. From what I understand, you raised the issue regarding the difficulty in reading the stderror from dbt in Airflow logs due to it being displayed on a single line. Dosu-bot provided a detailed response suggesting potential modifications to the cosmos/log.py and cosmos/dbt/parser/output.py files to preserve the newline characters from the dbt output. Tatiana also acknowledged the issue and expressed appreciation for potential contributions.

Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!