data-mie / dbt-profiler

Macros for generating dbt model data profiles
Apache License 2.0
81 stars 33 forks source link

Redshift `avg` returning `1` or `0` #81

Closed stevenconnorg closed 1 year ago

stevenconnorg commented 1 year ago

Hey there,

I seem to be having issues getting correct avg values returned in Redshift.

When we try to get the avg of a binary metric, Redshift returns either 1 or 0. Ideally, these metrics should be stored as a boolean value; however, I think we'd still have an issue since the measure_avg casts to 1 or 0 for boolean values, instead of 1.0 or 0.0.

Related stackoverflow thread: Redshift Avg not returning decimals

Is there some way that we can something like this so it works in Redshift?:


{%- if dbt_profiler.is_numeric_dtype(data_type) and not dbt_profiler.is_struct_dtype(data_type) -%}
    avg({{ adapter.quote(column_name) }} :: float)
{%- elif dbt_profiler.is_logical_dtype(data_type) -%}
    avg(case when {{ adapter.quote(column_name) }} then 1.0 else 0.0 end)
{%- else -%}
    cast(null as {{ dbt.type_numeric() }})
{%- endif -%}

Maybe create a different version of this macro to use only with the dbt-redshift adapter? I'm not sure how to do that, but would help in contributing with some guidance!

stumelius commented 1 year ago

@stevenconnorg Hi! Thanks for reporting this. There is indeed a way to create a custom version of measure_avg for Redshift only. Macros that work differently in different databases are called cross-database macros. Under the hood, dbt dispatches macro calls to the database adapter (e.g., redshift) you are using; if there's an adapter specific version of the macro available then that is used but if not, a default implementation for the macro is used. Here's an example: https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch#a-simple-example

So, what we need here is a redshift__measure_avg macro that customizes the avg logic for Redshift. I'd very much like if you could contribute this yourself and I'll help you out as needed. What do you think? :)

stevenconnorg commented 1 year ago

@stumelius -- thanks for the help! I've linked a PR that will add this functionality. Let me know if you need anything else from me atm!

stumelius commented 1 year ago

@stevenconnorg FYI https://github.com/data-mie/dbt-profiler/releases/tag/0.8.1