elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.82k stars 152 forks source link

Getting Redshift syntax errors on volume and column anomaly tests configured without timestamp parameter #1540

Open Negashion opened 1 month ago

Negashion commented 1 month ago

Describe the bug

In package 0.15.2 I am getting some syntax errors when running tests on Redshift:

To Reproduce Steps to reproduce the behavior:

  1. Create volume or column anomaly tests without timestamp and with boolean field dimensions in Redshift for package version 0.15.2

Environment (please complete the following information):

haritamar commented 1 month ago

Hi @Negashion ! I can confirm this is a bug. Basically this isn't allowed in Redshift:

select cast(TRUE as varchar)

So we likely need to modify the implementation of the edr_cast_as_string macro for Redshift (and maybe Postgres?), to handle boolean values correctly (may need passing the type of the column externally to the macro to produce different SQL for different data types).

If you'd like by any chance to contribute a fix to this we'd be happy to review / provide guidance.

Negashion commented 1 month ago

Hi haritamar,

Unfortunately I can't spare time on it at the moment, I am in a bit of a crunch, sorry. Just wanted to put the issue on your radar. I would love to give it a proper try once I am a bit free. Actually, I could use some feedback with the workaround I created. I reimplemented the macro in my project dbt/macros folder with an ad-hoc fix based on our column naming convention, we prefix all our boolean fields with "has" or "is" :

{%- macro edr_cast_as_string(column) -%}
     {%if column.startswith('is_') or column.startswith('has_')%}
        cast(decode({{ column }},true,'true',false,'false') as {{ elementary.edr_type_string() }})
    {% else %}
        cast({{ column }} as {{ elementary.edr_type_string() }})
    {%- endif -%}
{%- endmacro -%}

I am not being able to dispatch calls made with the namespace prefix format "elementary.edr_cast_as_string(column)" towards the new macro I declared in my project. It works well if I drop the prefix in the call. I added a dispatch config in my dbt_projects.yml to tackle it but it is not redirecting to my macro properly. Any idea why that is the case?

name: "project_x"

dispatch:
  - macro_namespace: elementary
    search_order: ['project_x', 'elementary']

Regards, Nega

haritamar commented 3 weeks ago

Hi @Negashion , Sorry for the delay. Yeah I think it may really be because as you mentioned we explicitly call it with the prefix elementary. - so the dispatch configuration is ignored and it takes the implementation from the "elementary" package. So the workaround may require patching the Elementary code and removing the prefix.