elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.87k stars 158 forks source link

Failure in column_anomalies test when column_timestamp isn't provided #1554

Open angeml opened 2 months ago

angeml commented 2 months ago

Describe the bug This line of code is causing failures in Databricks for the column anomalies test when a column_timestamp is missing.

Caused by: org.apache.spark.sql.catalyst.ExtendedAnalysisException: [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column, variable, or function parameter with name `last_session_start_ts` cannot be resolved.  SQLSTATE: 42703; line 30 pos 19
    at org.apache.spark.sql.catalyst.ExtendedAnalysisException.copyPlan(ExtendedAnalysisException.scala:91)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:688)

I've resolved this locally by adding a timestamp_column but others might not have that option.

To Reproduce

  1. Create a column_anomalies test for a model that doesn't have a timestamp_column
  2. Run test on Databricks
  3. Observe that the extra , at the end of start_bucket_in_data causes an issue with Databricks

Expected behavior This test should not produce an error.

Screenshots

Screenshot 2024-06-13 at 12 46 17 PM Screenshot 2024-06-13 at 12 47 01 PM

Environment (please complete the following information):

Additional context Slack - https://elementary-community.slack.com/archives/C02CTC89LAX/p1716306300184349

Would you be willing to contribute a fix for this issue? For sure 👍 But I think it just needs a comma removal 😄

angeml commented 2 months ago

Just realized that I likely should have created this issue in the https://github.com/elementary-data/dbt-data-reliability repo

haritamar commented 2 months ago

Hi @angeml ! Thanks for opening this issue and sorry for the delayed response. Yes you are absolutely right, it seems this flow was broken and we actually have a PR that fixes it which should be merged in the near future.

Larissa-Rocha commented 2 months ago

Hi guys! I've been experiencing the same issue in column_anomalies test running in Trino, so looking foward to this solution image