Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
324 stars 135 forks source link

User Export #317

Open dgitis opened 7 months ago

dgitis commented 7 months ago

Description & motivation

Work-in-progress resolving #285.

The goal of this PR is to support the pseudonymous_user_id and user_id tables.

My initial thought is that we should keep both the package's current user tables and new ones derived from the new user export options in GA4. The reason for this is to support both old and new installations.

README modifications are not done, but I expect we will want to add a new section on disabling user models that explains the differences between our various models and how to use them.

What defaults should we have for +enable on these tables?

This PR supports audiences fields from the export. While working with a test site that has implemented audiences, the data doesn't seem to be very useful unless its ID field joins with Google Ads audience exports. Despite this, I think we should leave this in because it needs to be enabled by a variable.

The user_properties fields were done without sample data. Naming and logic will likely need to be updated.

Multi-site is not currently supported.

We also might want to move the logic in base_ga4___* models to stg_ga4__* models.

I also need to review package naming conventions to ensure data, like geo data, is consistent with elsewhere in the package.

To-do:

Here are the docs for the source tables.

Checklist

dgitis commented 3 months ago

@adamribaudo-velir, while I've verified that these work locally, I've only verified with a minimal amount of data. Multi-site changes were verified by setting the same property ID several times under property_ids. Obviously this is not ideal.

It would be best to verify this using a proper multi-site installation. I don't have any current clients, but I can ask some past clients.

I did not write any tests. dbt recommends testing certain things. The only logic worth testing , the multi-site macro, is not testable by the new framework.

I defaulted these new tables to +enable: false.

I also ran base_ga4__pseudonymous_users in to stg_ga4__client_ids. I renamed at the staging model because that's where the client IDs are generated.

adamribaudo-velir commented 3 months ago

I'm on board with the approach, but unable to run it due to this error. Couldn't find an obvious issue. I can run the main branch fine with the same project configuration.

20:08:18  Completed with 1 error and 0 warnings:
20:08:18
20:08:18    Compilation Error in model base_ga4__events (models\staging\base\base_ga4__events.sql)
  can only concatenate str (not "int") to str

  > in macro combine_property_data (macros\combine_property_data.sql)
  > called by macro run_hooks (macros\materializations\hooks.sql)
  > called by macro materialization_incremental_bigquery (macros\materializations\incremental.sql)
  > called by model base_ga4__events (models\staging\base\base_ga4__events.sql)
dgitis commented 3 months ago

I fixed the duplicate code issues which seems to have fixed the error, not sure why though, and also updated a test that worked on my very small test site but wouldn't work on a larger site.

adamribaudo-velir commented 2 weeks ago

I get an error when building the new staging models:

(python312env) C:\GitHub\velir-ga4-test-project>dbt run -m stg_ga4__client_keys
13:20:47  Running with dbt=1.8.7
13:20:48  Registered adapter: bigquery=1.8.3
13:20:48  Found 42 models, 29 data tests, 1 seed, 3 sources, 623 macros
13:20:48
13:20:50  Concurrency: 4 threads (target='dev')
13:20:50
13:20:50  1 of 1 START sql view model dbt_dev_aribaudo.stg_ga4__client_keys .............. [RUN]
13:20:51  BigQuery adapter: https://console.cloud.google.com/bigquery?project=velir-website-analytics&j=bq:US:660c1784-627b-4503-b4e5-a08e57355482&page=queryresults
13:20:51  1 of 1 ERROR creating sql view model dbt_dev_aribaudo.stg_ga4__client_keys ..... [ERROR in 1.17s]
13:20:51
13:20:51  Finished running 1 view model in 0 hours 0 minutes and 2.93 seconds (2.93s).
13:20:51
13:20:51  Completed with 1 error and 0 warnings:
13:20:51
13:20:51    Database Error in model stg_ga4__client_keys (models\staging\stg_ga4__client_keys.sql)
  Syntax error: Expected ")" but got identifier "user_property_name" at [11:82]
  compiled code at target\run\ga4\models\staging\stg_ga4__client_keys.sql
13:20:51
13:20:51  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

I have 1 user property defined:

      user_properties:
        - user_property_name: "random_number"
          value_type: "int_value"

I'd be ok approving a PR with JUST the base model data for user export so that package users at least have access to that data. We could sort out the most advanced stuff (pulling in properties, audiences) later.