elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.94k stars 165 forks source link

`UNRESOLVED_COLUMN` exception when upgrading to elementary 0.15.1 #1538

Open hamzamazhar opened 6 months ago

hamzamazhar commented 6 months ago

Describe the bug When I upgrade to elementary 0.15.0 from 0.13.0 and run elementary models, it fails for the model model_run_results and throws this exception:

Runtime Error in model model_run_results (models/edr/run_results/model_run_results.sql)
  [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `models`.`unique_id` cannot be resolved. Did you mean one of the following? [`run_results`.`unique_id`, `models`.`metadata_hash`, `run_results`.`query_id`, `run_results`.`name`, `run_results`.`failures`]

To Reproduce Steps to reproduce the behavior:

  1. One a new dbt project, use elementary==0.13.0 as a package and run elementary models with dbt run -s elementary
  2. Then, upgrade the elementary package to 0.15.0 and install this package.
  3. Run the elementary models again with dbt run -s elementary and this exception occurs.

Expected behavior Elementary models should run.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

haritamar commented 6 months ago

Hi @hamzamazhar ! Thanks for posting this issue. Unfortunately I'm not able to reproduce this, so I'm guessing this issue is a bit more specific to your environment.

I see that you are using Databricks. Are you using it with or without Unity Catalog? Does the dbt_models table exist in the elementary schema when this error occurs?

Any additional information about your setup can help to understand this.

hamzamazhar commented 4 months ago

@haritamar Apologies on my side. I upgraded from 0.13.0 -> 0.15.0 version. In the issue, I wrote 0.15.0 -> 0.15.1, which is incorrect. I have fixed the description of the issue.

I faced the same problem when I pushed the upgrade from version 0.13 to 0.15 on my production environment.

However, the interesting thing to note is that I ran this in an Airflow task with 3 retries specified on it.

My hunch is that some columns were missing in the elementary models when it first ran. However, it created those columns so by the time Airflow retried the task, the columns were already there and it passed.

I hope this is helpful.

hamzamazhar commented 3 months ago

@haritamar Update on this issue: We have faced this error 3 times in last week in production. We get this error when we try to run elementary report and it fails:

[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `dbt_tests`.`unique_id` cannot be resolved.

We run the dbt run -s elementary command and it fixes this issue, but it occurs again, and seemingly randomly.

The interesting thing is that I noticed that elementary creates a tmp table with the same path as dbt_tests with the following SQL command:

create or replace table `hive_metastore`.`elementary`.`dbt_tests__tmp_20240812011223887172`
using delta
location 's3://***/elementary/dbt_tests'
  as
    SELECT
            metadata_hash
    FROM `hive_metastore`.`elementary`.`dbt_tests`
    WHERE 1 = 0

I think it creates a table with just one column metadata_hash, and overwrites it with dbt_tests table. Afterwards, the elementary report fails because it can't find any of the columns in the dbt_tests table.

Kindly let us know how we can fix this issue on our side.

Thanks 🙏