Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
324 stars 135 forks source link

How to recreate GA4 reports using this schema? #337

Open umairkarel opened 3 months ago

umairkarel commented 3 months ago

I am currently working on recreating some reports using the modeled schema provided. However, I am encountering some challenges in formulating the SQL queries needed to generate these reports accurately.

Could anyone please share guidance or examples on how to recreate the GA4 reports using SQL queries based on the modeled schema?

dgitis commented 3 months ago

You're going to need to be much more specific if you want help here.

Regardless, don't expect numbers to exactly match.

There are a number of places where we have deliberately decided to diverge from GA4.

Google uses Hyperlog++ to estimate numbers. The dbt-GA4 package counts the numbers.

Google calculates source, medium, and campaign individually resulting in source/medium combinations that shouldn't have a campaign being assigned the next available campaign. The dbt-GA4 package assigns all attribution parameters to the first event with a valid source.

I've written a free course on how you are supposed to use the dbt-GA4 package. I think it might help you get started but the goal of the package is to produce better numbers than GA4 so be prepared to see differences.

umairkarel commented 3 months ago

Thanks a lot, @dgitis!

I am trying to recreate the GA4 "User Acquisition" report. Could you please help me with the SQL query for this report?

image

Additionally, it would be fantastic if you could assist with the e-commerce purchase report as well.

image

Thank you for your assistance!

dgitis commented 3 months ago

For the first report, you'll want to blend or join (the terminology depends on the reporting tool) the fct_ga4__sessions_daily and the dim_ga4__sessions_daily tables on the session_partition_key.

You'll want either session_default_channel_grouping or last_non_direct_default_channel_grouping as your primary dimension along with the session_partition_date for your date range.

From fct_ga4__sessions_daily you'll want the following.

Treat the above queryies as pseudo code. The actual query will depend on your reporting tool.

dgitis commented 3 months ago

The second report requires customization and it depends on how you set your warehouse up. Take the course I linked, set things up. Try to figure it out yourself, and then share your code.

It will be much easier to help you when you have a base setup.

Hint: You'll want to create a events fact table with all events in it. This table will mostly be used for funnel reporting so you can filter some events that aren't relevant to funnels, like scroll and user_engagement. Obviously, you'll want device data in this table and then when reporting you'll use a lot of countif(event_name = 'add_to_cart').

Treat this as pseudo code. The actual query will depend on your reporting tool.

umairkarel commented 3 months ago

Thanks @dgitis!!!

umairkarel commented 2 months ago

Hey @dgitis,

I wanted to get the report for the page path and the total revenue associated with it. My team decided to add user_pseudo_id to both the fact_ga4__pages and fact_ga4__items_ecommerce tables, and then join them on this field to calculate the total revenue. However, this approach doesn’t seem correct to me can you verify that and If possible, could you please help us identify the fields we could use for these two tables in a way that allows us to accurately calculate the total revenue for each page path after join?

Thanks in advance!

dgitis commented 2 months ago

You should be using the client_key which is based on user_pseudo_id but is more robust.

The pages model isn't meant to be joined. It's optimized for looking at data by page URL adding an ID to this table defeats the purpose of having a pre-aggregated table.

Instead, you should create a fct_ga4__event_page_view model with one row per page_view event and join that to fct_ga4__sessions on client_key.