EqualExperts / dbt-unit-testing

This dbt package contains macros to support unit testing that can be (re)used across dbt projects.
MIT License
418 stars 77 forks source link

dbt-unit-testing plus Python models #165

Open maksimkrupeninepam opened 1 year ago

maksimkrupeninepam commented 1 year ago

We faced an issue when dbt-unit-testing package and Python models are used in one project. The reason is ref macro we added for the package:

{% macro ref() %}
   {{ return(dbt_unit_testing.ref(*varargs, **kwargs)) }}
{% endmacro %}

I managed to fix that by modifying the code in dbt_packages/dbt_unit_testing/macros/overrides.sql:

return (builtins.ref(project_or_package, model_name, **kwargs)) ->
{{ return (builtins.ref(model_name, **kwargs)) }}

But I'm not sure how this change may affect unit tests. Is there a better way to fix the issue if we want to avoid changing ref in all models like: select * from {{ dbt_unit_testing.ref('stg_customers') }}

Here is an example of a Python model:

def model(dbt, session):

    dbt.config(
        materialized = "table"
    )

    # get data from staging table
    sf_table = dbt.ref("my_snowflake_table")

    # describe the data
    data_profiling = sf_table.describe()

    return data_profiling

Thanks!

maksimkrupeninepam commented 1 year ago

Another issue with Python models was related to model versions. But we fixed that by using the latest version of the package (v0.3.2) as suggested in issue #160.

Thanks for the fix.

jacksond80 commented 11 months ago

We have faced the same issue. We resolved in a slightly different way. We use docker and rebuild the image on release so editing the dbt-unit-testing package file directly wasn't really a viable solution for us. We needed a way that we own, and hence version control.

We use the method suggested in the readme of this repo to override the ref and source macros:

{% macro ref() %}
   {{ return(dbt_unit_testing.ref(*varargs, **kwargs)) }}
{% endmacro %}

{% macro source() %}
   {{ return(dbt_unit_testing.source(*varargs, **kwargs)) }}
{% endmacro %}

We updated this to look like this:

{% macro ref() %}
   {% set python = kwargs["python"] | default (False) %}
   {% if python %}
      {{ return(dbt_unit_testing.ref(None, *varargs, **kwargs)) }}
   {% else %}
      {{ return(dbt_unit_testing.ref(*varargs, **kwargs)) }}
   {% endif %}
{% endmacro %}

{% macro source() %}
   {{ return(dbt_unit_testing.source(*varargs, **kwargs)) }}
{% endmacro %}

And then nothing needs change in your sql models, but in your python models, the signature of dbt ref calls should now be: dbt.ref(model_name, python=True)

We haven't come across a limitation yet, and it works for us. As far as we can tell you can't set the project_or_package_name variable from the python models anyway, so setting it to none when ref() is called from within a python model should be fine.

jsatyam7 commented 7 months ago

Hey @jacksond80 , I have tried your solution but it is giving me this error - 08:14:25 Encountered an error: Compilation Error call() got an unexpected keyword argument 'python'

in macro ref (macros/unit_testing_macro.sql) called by

I have changed the macro like this {% macro ref() %} {% set python = kwargs["python"] | default (False) %} {% if python %} {{ return(dbt_unit_testing.ref(None, *varargs, *kwargs)) }} {% else %} {{ return(dbt_unit_testing.ref(varargs, **kwargs)) }} {% endif %} {% endmacro %}

{% macro source() %} {{ return(dbt_unit_testing.source(*varargs, **kwargs)) }} {% endmacro %}

and calling my Python model like this - customers_df = dbt.ref("customers", python=True).to_pandas() Please let me know if i'm doing anything wrong here, thanks.

jacksond80 commented 7 months ago

@jsatyam7 I don't see anything obviously wrong, the error would suggest that you're attempted override of the ref() macro is not getting picked up. You can verify this by adding print statements to log to console so you can tell which macro gets called.

Is your macros path set correctly? (Do you have other macros that work ok in the macros folder?)