Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
312 stars 134 forks source link

Base GA4 events deduplication failing on event_params #258

Closed FlorianASchroeder closed 1 year ago

FlorianASchroeder commented 1 year ago

The de-duplication of events fails due to event_params if the elements within the array are unsorted.

Artificial example:

WITH events AS (
  SELECT 
    ARRAY<STRUCT<key STRING, value STRING>>[
      STRUCT("page_path", "/"),
      STRUCT("ignore_referrer", "true"),
      STRUCT("session_engaged", "1")
    ] AS event_params
  UNION ALL
  SELECT 
    ARRAY<STRUCT<key STRING, value STRING>>[
      STRUCT("ignore_referrer", "true"),
      STRUCT("page_path", "/"),  -- swapped order here
      STRUCT("session_engaged", "1")
    ]
  UNION ALL
  SELECT 
    ARRAY<STRUCT<key STRING, value STRING>>[
      STRUCT("page_path", "/"),
      STRUCT("ignore_referrer", "true"),
      STRUCT("session_engaged", "0")
    ]
)

SELECT *
FROM events
qualify ROW_NUMBER() OVER (PARTITION BY to_json_string(events.event_params))  = 1

Instead, an explicit ordering resolves this issue:

qualify ROW_NUMBER() OVER (PARTITION BY to_json_string(ARRAY(SELECT x FROM UNNEST(event_params) AS x ORDER BY key)))  = 1