elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.9k stars 162 forks source link

Cannot generate observability report #1495

Open naomicarrillo5 opened 5 months ago

naomicarrillo5 commented 5 months ago

Describe the bug I am having trouble generating the observability report using the command edr report. I have ensured that the Elementary profile is configured correctly (I am able to run both edr --help and edr monitor successfully) as well as have installed the Elementary CLI (including the command needed for Snowflake specifically).

Below is the error message obtained:

2024-04-17 22:31:53 — INFO — Running with edr=0.14.1
2024-04-17 22:32:27 — INFO — edr (0.14.1) and Elementary's dbt package (0.14.1) are compatible.
2024-04-17 22:32:30 — INFO — Elementary's database and schema: '"analytics.dbt_production_elementary"'
2024-04-17 22:32:30 — INFO — Running dbt --log-format json run-operation elementary.log_macro_results --args {"macro_name": "elementary_cli.get_test_results", "macro_args": {"days_back": 7, "invocations_per_test": 720, "disable_passed_test_metrics": false}} --project-dir /home/vscode/.local/lib/python3.9/site-packages/elementary/monitor/dbt_project
2024-04-17 22:34:37 — ERROR — Could not generate the report - Error: Failed to run dbt command.
Please reach out to our community for help with this issue.
Traceback (most recent call last):
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/clients/dbt/dbt_runner.py", line 88, in _run_command
    result = subprocess.run(
  File "/usr/local/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['dbt', '--log-format', 'json', 'run-operation', 'elementary.log_macro_results', '--args', '{"macro_name": "elementary_cli.get_test_results", "macro_args": {"days_back": 7, "invocations_per_test": 720, "disable_passed_test_metrics": false}}', '--project-dir', '/home/vscode/.local/lib/python3.9/site-packages/elementary/monitor/dbt_project']' died with <Signals.SIGTERM: 15>.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/monitor/api/report/report.py", line 50, in get_report_data
    tests_api = TestsAPI(
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/monitor/api/tests/tests.py", line 39, in __init__
    self.test_results_db_rows = self._get_test_results_db_rows(
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/monitor/api/tests/tests.py", line 51, in _get_test_results_db_rows
    return self.tests_fetcher.get_all_test_results_db_rows(
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/monitor/fetchers/tests/tests.py", line 22, in get_all_test_results_db_rows
    run_operation_response = self.dbt_runner.run_operation(
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/clients/dbt/dbt_runner.py", line 160, in run_operation
    success, command_output = self._run_command(
  File "/home/vscode/.local/lib/python3.9/site-packages/elementary/clients/dbt/dbt_runner.py", line 99, in _run_command
    raise DbtCommandError(err, command_args, logs=logs)
elementary.exceptions.exceptions.DbtCommandError: Failed to run dbt command.

I have also tried reinstalling dbt, updating dbt and the relevant plugin dbt-snowflake. I also ensured that the elementary tests are ran before I created the report and that there are results in the table ELEMENTARY_TEST_RESULTS. As another note, I am running this on a Github codespace.

To Reproduce Run edr report

Expected behavior Successfully run edr report

Environment (please complete the following information):

sphinks commented 5 months ago

Same here. Have the issue with 0.14.1 version as well. As workaround I'm recreating elementary schema:

dbt run -t prod --select elementary --vars '{"elementary_full_refresh": "true"}'

But couple days after it stops working and fails with the same issue.

sphinks commented 4 months ago

Any hints about the issue? What could be wrong? It keeps failing each several days.

haritamar commented 4 months ago

Hi @naomicarrillo5 @sphinks , Thanks for opening this issue and apologies for the delay.

The issue above seems potentially like a resources issue (likely memory). @sphinks are you also getting the "died with SIGTERM" error? To better understand if this is an issue in Elementary, It would be helpful if you can share details on the setup on which you are running the Elementary CLI:

Thanks, Itamar

sphinks commented 4 months ago

@haritamar

  1. it is happening in production (have no chance to reproduce the error locally)
  2. 4 Gb of memory is used.
  3. Consumption up to 100%, but that happens with previous version of elementary and it was working. But I got your point - try to increase memory up to 8Gb and will check if that will help.

Any idea why does new version consumes more memory?

haritamar commented 4 months ago

@sphinks - I'm actually not aware of a specific issue with newer versions that would cause this - did you notice it only happened to you in 0.14.1? From which version did you upgrade?

sphinks commented 4 months ago

@haritamar i've notice that on 0.14.1. It was updated from 0.13.0. As in traceback there is a reference to test API and the issue with get_all_test_results_db_rows I wonder could it be that table layour has changed in new veriosn? Should be drop and recreate tables from the scratch?

haritamar commented 4 months ago

Thanks @sphinks . I don't believe this has anything to do with the table schemas - it's likely that the new version consumes more memory than before (which is interesting, we'll keep track of more evidence of this and see if we can reduce the memory footprint)

Did increasing memory work for you though?

david-beallor commented 3 months ago

We are experiencing this same error at compass digital as of yesterday. @sphinks did you manage to find a solution? We are using elementary 0.15.1 and dbt 1.6.0

sphinks commented 3 months ago

@david-beallor @haritamar Raising memory limit helps me. I'm not sure if there is an issue with amount of data we keep in elementary log table or if there are memory leaks in new version Elementary CLI.

Anyway, as additional way to secure I have placed a regular job that is full-refreshing elementary schema. It is not nessesary (new memory limit were able to process 1 month data and do not test deeper), but we not using data more than default 7 days, so regular clean up of data helps to maintain report generation speed and new records insert lag at the same level. I wish I could not just full-refresh tables, but clean old records. It looks like there is no fast solution for doing that, so I just go with full-refresh for now.