Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
324 stars 135 forks source link

Fix invalid generated column names in conversion events #327

Closed jerome-laurent-pro closed 5 months ago

jerome-laurent-pro commented 5 months ago

Description & motivation

I encountered an issue with non standard event names that I wanted to use as conversion events. Stuff like 50%_completion or condition-click. I can't change their names in GA4, so I opened this PR to propose a way to create clean column names when they are used to generate conversion events related models. Currently, using the conversion_events variable with these events will result in errors as columns named like count_condition-click will be generated.

I found this issue that was related (https://github.com/Velir/dbt-ga4/issues/211)

Checklist

adamribaudo-velir commented 5 months ago

Great! Lmk if you need help finalizing the tests. Eventually we'll refactor to use dbt's newly released unit test functionality.

adamribaudo-velir commented 5 months ago

@jerome-laurent-pro check out the latest commit. You just needed to add the macro to unit tests that call the macro.

The tests still fail, but for a valid reason: What value is a user expected to enter into the conversion_events dbt variable? Either they need to enter the re-formatted event name or you'll need to update this line of code in stg_ga4__page_conversions and stg_ga4__session_conversions_daily:

{% for ce in var('conversion_events',[]) %}
    , countif(event_name = '{{ce}}') as {{ga4.valid_column_name(ce)}}_count
    {% endfor %}

Notice that the countif statement uses the original event name.

jerome-laurent-pro commented 5 months ago

You just needed to add the macro to unit tests that call the macro.

Thank you!

Either they need to enter the re-formatted event name or you'll need to update this line of code

The idea was to use the official event_name from GA4 as it'll be named like this in the raw data. Users shouldn't have to guess the future safe name while entering the variable in the project's yaml. So countif(event_name = '{{ce}}') will still count the events, but the new column will have a safe name (preventing dbt run from failing due to an incorrectly named column in the generated SQL)

The modification you did in the test was creating an error, because the safe column name for my-page-view is my_page_view_count. The aggregated expectation in the test was still named page_view_count, hence the failure in the test. Unrecognized name: my_page_view_count; Did you mean page_view_count? at [7:21] Using page-view works just fine.

In the other test, it seems that the project_config_update was erasing the static_incremental_days variable set in conftest.py.

https://github.com/Velir/dbt-ga4/blob/14b9d6f1e86a09a34fd41d37c37696c0040cb356/unit_tests/test_stg_ga4__session_conversions_daily.py#L37

Adding it again fix the error. It seems like conftest isn't really working as I expected, otherwise I assume we wouldn't have had to do a project_config_update in each test.


@adamribaudo-velir, let me know if you think the way I implemented the functionality doesn't make sense or if you want me to add anything else.

adamribaudo-velir commented 5 months ago

Ah! Thanks, I think I misunderstood the change a bit. And sorry for inadvertently breaking the test while I fixed it 😓

Let me run through an example on my end, but I expect to merge this by end of week.

adamribaudo-velir commented 5 months ago

Confirmed this works as expected

image

image