Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
312 stars 134 forks source link

Uniqueness test failing in fct_ga4__user_ids #191

Closed TannerHopkins closed 1 year ago

TannerHopkins commented 1 year ago

We're seeing the unique_fct_ga4__user_ids_user_id_or_client_key test fail. It's only happening for one client_key, but that client_key has 3 different stream_id values.

Perhaps this was an unintended effect of swapping out user_pseudo_id for client_key in this PR? Maybe there could only be one user_pseudo_id per stream_id before, but now there can be multiple?

Regardless, I did find it interesting that it is grouping by user_id_or_client_key and stream_id in fct_ga4__user_ids, but then testing for uniqueness on user_id_or_client_key by itself:

select
    user_id_or_client_key,
    stream_id,
    max(is_user_id) as is_user_id,
    min(first_seen_timestamp) as first_seen_timestamp,
    ....
from user_id_mapped
group by 1, 2

I'm not sure if the fix would be to update the uniqueness test to take into account stream_id as well, or possible stop grouping by stream_id and instead take the min/max, similar to what is being done with max(is_user_id) as is_user_id above.

dgitis commented 1 year ago

Is it possible that you haven't rebuilt your data with the --full-refresh flag on since you fixed the issue in #181 ?

Alternately, the install instructions need updating. The latest version is 3.2.1, but this is what we have in our docs right now:

packages:
  - package: Velir/ga4
    version: [">=3.0.0", "<3.2.0"]

I'll make a PR for this issue. Please let us know if this is not the source of your problem.

adamribaudo-velir commented 1 year ago

@TannerHopkins are you using user_ids in your project? If a user_id spanned multiple streams then the uniqueness check running on user_id_or_client_key would fail.

The uniqueness test should be on user_id_or_client_key || stream_id I'll update that now. Try checking out this branch: https://github.com/Velir/dbt-ga4/pull/195 and let me know the results?

TannerHopkins commented 1 year ago

Sorry for the delay here - so I switched the package revision to point to that branch and it still doesn't work 🤔 I ran a dbt clean then a dbt deps & I see the new test:

image

However, the test fails and when I look at the compiled code it's seemingly still the old test:

image

What does your compiled code look like @adamribaudo-velir ?

adamribaudo-velir commented 1 year ago

I think this is my bad. I need to move the test under the models key, not columns https://docs.getdbt.com/reference/resource-properties/tests#testing-an-expression