fivetran / dbt_fivetran_utils

Helper utils for our packages
29 stars 19 forks source link

add and incorporate macro for turning jsons (and only jsons) into strings #129

Open fivetran-jamie opened 10 months ago

fivetran-jamie commented 10 months ago

What change does this PR introduce?

- Adds new macro, `get_json_columns_in_relation` to return a list of columns that are of type JSON (on BigQuery only, as we haven't rolled out JSON support in other Fivetran destinations). - Includes call to `get_json_columns_in_relation` in `fill_staging_columns`, so that each JSON field gets converted to a json-formatted STRING. This is necessary because only some connectors have JSONs currently and downstream json functions do not work seamlessly with the actual JSON-type columns. - `fill_staging_columns` was chosen as this is called in every (most?) source package model. therefore, we don't have to mass update a bunch of packages. however, we want this to be right and not break anything! **If this PR introduces a new macro, how did you test the new macro?**

get_json_columns_in_relation

I first tested get_json_columns_in_relation by directly calling it in a local test model.

the model i ran it against was just a select * from Joe's seed file tiktok_ads_source_integration_tests_2.tiktok_adgroup_history_data, which contains a JSON field called AGE_GROUPS.

image

When running, the test model, i was able to get the macro to return and output ['AGE_GROUPS']

fill_staging_columns

I tested the incorporation of get_json_columns_in_relation into fill_staging_columns by running the tiktok ads source package (specifically from the integration_tests folder) locally.

With tiktok_adgroup_history_data as the source, the model where trouble would pop up was stg_tiktok_ads__ad_group_history, specifically on the line where we try to coalesce age_groups and an actual-string field, age.

image

With my additions, the fields CTE in stg_tiktok_ads__ad_group_history compiles to the following

fields as (

    select

    action_days

        as 

    action_days

, 

    adgroup_id

        as 

    adgroup_id

# ................ cutting stuff out that's not relevant .......................... #

, 
    cast(null as string) as 

    age

 , 
    TO_JSON_STRING( 

    age_groups

 )
        as 

    age_groups

, 

    languages

        as 

    languages

, cast('' as string) as source_relation

    from base
), 

And so the coalesce of age_groups and age in the final CTE does not cause an error and the model runs successfully.

and the data looks OK in bigquery

image

If this PR introduces a modification to an existing macro, which packages is the macro currently present in and what steps were taken to test compatibility across packages?

Honestly we should test this with other packages as well -- ones with JSONs and not -- given the far-reaching nature of this macro change. perhaps we could have some customers try this out

Did you update the README to reflect the macro addition/modifications?

fivetran-jamie commented 10 months ago

keeping this in our back pocket for now.

if we want to move forward with this PR we should add a variable that is a boolean, telling us whether or not to convert back to strings or maintain the json type