elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.94k stars 165 forks source link

Uploading run results fails if model is skipped (BigQuery) #1745

Open lari opened 1 week ago

lari commented 1 week ago

Describe the bug When a dbt run skips a BigQuery materialized view model, the Elementary package fails with an error Value has type STRING which cannot be inserted into column rows_affected, which has type INT64.

Looking at the SQL job in BigQuery console, I have identified the issue to be that Elementary tries to insert a value '-1' (with quotes) to the rows_affected column, which is INT64.

I have further identified that the value '-1' is not identified as number in the insert_rows macro on line {%- if value is number -%} here: https://github.com/elementary-data/dbt-data-reliability/blob/0.16.1/macros/utils/table_operations/insert_rows.sql#L191

Changing the line to {%- if value is number or value == '-1' -%} would fix the issue.

However, there's a question of how to store "skipped" run results at all?

To Reproduce Steps to reproduce the behavior:

  1. Create a materialized view with on_configuration_change = 'continue' config
  2. dbt run to create the view
  3. Make a schema change in the materialized view
  4. dbt run

Expected behavior The dbt project and Elementary package should run without errors.

Screenshots

Error message:

on-run-end failed, error:
Value has type STRING which cannot be inserted into column rows_affected, which has type INT64 at 

Environment (please complete the following information):

Additional context

Here's the run_results.json from the run. As you can see, the rows_affected is set to "-1".

    "metadata": {
        "dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v6.json",
        "dbt_version": "1.8.7",
        "generated_at": "2024-11-13T06:58:45.324884Z",
        "invocation_id": "...",
        "env": {}
    },
    "results": [
        {
            "status": "success",
            "timing": [
                {
                    "name": "compile",
                    "started_at": "2024-11-13T06:58:39.217536Z",
                    "completed_at": "2024-11-13T06:58:39.245089Z"
                },
                {
                    "name": "execute",
                    "started_at": "2024-11-13T06:58:39.245704Z",
                    "completed_at": "2024-11-13T06:58:39.630172Z"
                }
            ],
            "thread_id": "Thread-1 (worker)",
            "execution_time": 0.41369032859802246,
            "adapter_response": {
                "_message": "skip `project`.`dataset`.`table`",
                "code": "skip",
                "rows_affected": "-1"
            },
            "message": "skip `project`.`dataset`.`table`",
            "failures": null,
            "unique_id": "model.model_name",
            "compiled": true,
            "compiled_code": "...",
            "relation_name": "`project`.`dataset`.`table`"
        }
    ],
    "elapsed_time": 8.028469562530518,
    "args": {
        ...
    }
}

Would you be willing to contribute a fix for this issue?

Possibly yes, but there are design decision needed on how to handle run results for skipped models.