Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
312 stars 134 forks source link

Streaming daily refactor + Multiple project config refactor #238

Closed jice-lavocat closed 1 year ago

jice-lavocat commented 1 year ago

Description & motivation

Merge PR between PR188 and PR231

PR188 - Refactor daily & streaming support

Previously, this package supported daily + streaming (aka intraday) exports in a relatively naïve way by incrementally processing 'daily' data and then unioning the 'streaming' data as a view in the stg_ga4__events model. This can cause issues when tables are rotated from 'intraday' to 'streaming' as noted in #80 and in other cases noted in #182.

This update ensures that daily & streaming data are handled using the same method: dats is loaded incrementally into the base model and deduped before being loaded into stg_ga4__events.

As a byproduct, this PR removes support for 'dynamically' incremental processing based on the last date found in a table using _max_partition. Instead, the static_incremental_days variable will need to be used which will ensure that data is reprocessed as the intraday table changes throughout the day.

PR231 - Implementing multi-project architecture ( issue #231 )

Updating the dbt_projects.yml format to specify the GCP project and GA4 property_id.

Use case: when sources are spread on different distinct GCP Projects. Related to #231

Note: This new config format can allow for a more specific configuration (example if property 11111111 is on daily frequency while property 22222222 is on streaming import this could be implemented in this way too in the future).

Checklist

jice-lavocat commented 1 year ago

@adamribaudo-velir I wasn't really sure that it was legitimate to create this new PR. Please dismiss it if relevant.