elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.84k stars 153 forks source link

Unable to generate the observability report #1571

Closed rohilla-anuj closed 2 weeks ago

rohilla-anuj commented 1 month ago

Describe the bug We are using Cloud Composer for orchestrating our dbt core model runs and currently we are trying to use the same platform for generating the observability report every 3 hours. We are able to achieve this within the development environment using the command

edr report --project-dir /home/airflow/gcs/dags/dbt --profiles-dir /home/airflow/gcs/dags/dbt --profile-target dev --days-back 180 --file-path /home/airflow/gcs/data/index.html --project-name dev-project --env dev --config-dir /home/airflow/gcs/data/.edr

When we are trying to run the same command for the production environment we are getting below mentioned error. However when running the command with --days-back 5 the command runs as expected and generate the observability report, the command is failing for --days-back > 5

edr report --project-dir /home/airflow/gcs/dags/dbt --profiles-dir /home/airflow/gcs/dags/dbt --profile-target prod --days-back 180 --file-path /home/airflow/gcs/data/index.html --project-name prod-project --env prod --config-dir /home/airflow/gcs/data/.edr
[2024-06-20 06:54:38.718209+00:00] {subprocess.py:93} INFO - 2024-06-20 06:54:38 — INFO — edr (0.14.1) and Elementary's dbt package (0.14.1) are compatible.
[2024-06-20 06:57:21.993894+00:00] {subprocess.py:93} INFO - 2024-06-20 06:57:21 — INFO — Elementary's database and schema: '"elastic-edm-prod.dea__dbt_monitoring"'
[2024-06-20 06:57:21.994007+00:00] {subprocess.py:93} INFO - 2024-06-20 06:57:21 — INFO — Running dbt --log-format json run-operation elementary.log_macro_results --args {"macro_name": "elementary_cli.get_test_results", "macro_args": {"days_back": 180, "invocations_per_test": 720, "disable_passed_test_metrics": false}} --project-dir /home/airflow/gcsfuse/actual_mount_path/data/venv/lib/python3.11/site-packages/elementary/monitor/dbt_project --profiles-dir /home/airflow/gcs/data/dbt --target prod
[2024-06-20 07:02:24.205424+00:00] {subprocess.py:93} INFO - 2024-06-20 07:02:24 — INFO — Running dbt --log-format json run-operation elementary.log_macro_results --args {"macro_name": "elementary_cli.get_source_freshness_results", "macro_args": {"days_back": 180, "invocations_per_test": 720}} --project-dir /home/airflow/gcsfuse/actual_mount_path/data/venv/lib/python3.11/site-packages/elementary/monitor/dbt_project --profiles-dir /home/airflow/gcs/data/dbt --target prod
[2024-06-20 07:10:22.378690+00:00] {subprocess.py:93} INFO - [31;20m2024-06-20 07:10:21 — ERROR — Could not generate the report - Error: Failed to run dbt command.
[2024-06-20 07:10:22.378830+00:00] {subprocess.py:93} INFO - Please reach out to our community for help with this issue.[0m
[2024-06-20 07:10:22.378977+00:00] {subprocess.py:93} INFO - Traceback (most recent call last):
[2024-06-20 07:10:22.379136+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/clients/dbt/dbt_runner.py", line 88, in _run_command
[2024-06-20 07:10:22.379177+00:00] {subprocess.py:93} INFO -     result = subprocess.run(
[2024-06-20 07:10:22.379361+00:00] {subprocess.py:93} INFO -              ^^^^^^^^^^^^^^^
[2024-06-20 07:10:22.379470+00:00] {subprocess.py:93} INFO -   File "/opt/python3.11/lib/python3.11/subprocess.py", line 571, in run
[2024-06-20 07:10:22.379575+00:00] {subprocess.py:93} INFO -     raise CalledProcessError(retcode, process.args,
[2024-06-20 07:10:22.379662+00:00] {subprocess.py:93} INFO - subprocess.CalledProcessError: Command '['dbt', '--log-format', 'json', 'run-operation', 'elementary.log_macro_results', '--args', '{"macro_name": "elementary_cli.get_source_freshness_results", "macro_args": {"days_back": 180, "invocations_per_test": 720}}', '--project-dir', '/home/airflow/gcsfuse/actual_mount_path/data/venv/lib/python3.11/site-packages/elementary/monitor/dbt_project', '--profiles-dir', '/home/airflow/gcs/data/dbt', '--target', 'prod']' died with <Signals.SIGKILL: 9>.
[2024-06-20 07:10:22.379742+00:00] {subprocess.py:93} INFO - 
[2024-06-20 07:10:22.379880+00:00] {subprocess.py:93} INFO - During handling of the above exception, another exception occurred:
[2024-06-20 07:10:22.379953+00:00] {subprocess.py:93} INFO - 
[2024-06-20 07:10:22.380046+00:00] {subprocess.py:93} INFO - Traceback (most recent call last):
[2024-06-20 07:10:22.380117+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/monitor/api/report/report.py", line 56, in get_report_data
[2024-06-20 07:10:22.380198+00:00] {subprocess.py:93} INFO -     source_freshnesses_api = SourceFreshnessesAPI(
[2024-06-20 07:10:22.380283+00:00] {subprocess.py:93} INFO -                              ^^^^^^^^^^^^^^^^^^^^^
[2024-06-20 07:10:22.380459+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/monitor/api/source_freshnesses/source_freshnesses.py", line 36, in __init__
[2024-06-20 07:10:22.380465+00:00] {subprocess.py:93} INFO -     self._get_source_freshness_results_db_rows(
[2024-06-20 07:10:22.380528+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/monitor/api/source_freshnesses/source_freshnesses.py", line 47, in _get_source_freshness_results_db_rows
[2024-06-20 07:10:22.380598+00:00] {subprocess.py:93} INFO -     return self.tests_fetcher.get_source_freshness_results_db_rows(
[2024-06-20 07:10:22.380669+00:00] {subprocess.py:93} INFO -            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-06-20 07:10:22.380758+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/monitor/fetchers/source_freshnesses/source_freshnesses.py", line 23, in get_source_freshness_results_db_rows
[2024-06-20 07:10:22.380849+00:00] {subprocess.py:93} INFO -     run_operation_response = self.dbt_runner.run_operation(
[2024-06-20 07:10:22.380944+00:00] {subprocess.py:93} INFO -                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-06-20 07:10:22.380998+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/clients/dbt/dbt_runner.py", line 160, in run_operation
[2024-06-20 07:10:22.381079+00:00] {subprocess.py:93} INFO -     success, command_output = self._run_command(
[2024-06-20 07:10:22.381146+00:00] {subprocess.py:93} INFO -                               ^^^^^^^^^^^^^^^^^^
[2024-06-20 07:10:22.381225+00:00] {subprocess.py:93} INFO -   File "/home/airflow/gcs/data/venv/lib/python3.11/site-packages/elementary/clients/dbt/dbt_runner.py", line 99, in _run_command
[2024-06-20 07:10:22.381306+00:00] {subprocess.py:93} INFO -     raise DbtCommandError(err, command_args, logs=logs)
[2024-06-20 07:10:22.381383+00:00] {subprocess.py:93} INFO - elementary.exceptions.exceptions.DbtCommandError: Failed to run dbt command.
[2024-06-20 07:10:38.158923+00:00] {subprocess.py:97} INFO - Command exited with return code 1
[2024-06-20 07:10:38.206218+00:00] {taskinstance.py:1939} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/operators/bash.py", line 210, in execute
    raise AirflowException(
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 1.
[2024-06-20 07:10:38.210912+00:00] {taskinstance.py:1401} INFO - Marking task as UP_FOR_RETRY. dag_id=test_command, task_id=Elementary_Report_Generation, execution_date=20240620T062756, start_date=20240620T062758, end_date=20240620T071038
[2024-06-20 07:10:38.240078+00:00] {standard_task_runner.py:104} ERROR - Failed to execute job 81812 for task Elementary_Report_Generation (Bash command failed. The command returned a non-zero exit code 1.; 64768)
[2024-06-20 07:10:38.251565+00:00] {local_task_job_runner.py:228} INFO - Task exited with return code 1
[2024-06-20 07:10:38.303331+00:00] {taskinstance.py:2781} INFO - 0 downstream tasks scheduled from follow-on schedule check

We have validated the worker nodes have enough CPU and Memory to run the DAGs.

To Reproduce Steps to reproduce the behavior:

edr report --project-dir /home/airflow/gcs/dags/dbt --profiles-dir /home/airflow/gcs/dags/dbt --profile-target prod --days-back 180 --file-path /home/airflow/gcs/data/index.html --project-name prod-project --env prod --config-dir /home/airflow/gcs/data/.edr

Expected behavior To generate the observability report with 180 days worth of data.

Screenshots Worker Resources during the execution of the observability report generation DAG

Screenshot 2024-07-02 at 3 17 38 PM

Environment (please complete the following information):

aleenprd commented 3 weeks ago

try with elementary dbt package 0.11.1. I had a similar issue with Snowflake

rohilla-anuj commented 2 weeks ago

Thanks for the suggestion @aleenprd . I kept the Elementary dbt package to the same version i.e. 0.14.1 as I can't downgrade it as its getting used by other DAGs as well. However downgrading Elementary CLI (edr) version to 0.11.0 worked for me.