Open umairkarel opened 3 months ago
You're going to need to be much more specific if you want help here.
Regardless, don't expect numbers to exactly match.
There are a number of places where we have deliberately decided to diverge from GA4.
Google uses Hyperlog++ to estimate numbers. The dbt-GA4 package counts the numbers.
Google calculates source, medium, and campaign individually resulting in source/medium combinations that shouldn't have a campaign being assigned the next available campaign. The dbt-GA4 package assigns all attribution parameters to the first event with a valid source.
I've written a free course on how you are supposed to use the dbt-GA4 package. I think it might help you get started but the goal of the package is to produce better numbers than GA4 so be prepared to see differences.
Thanks a lot, @dgitis!
I am trying to recreate the GA4 "User Acquisition" report. Could you please help me with the SQL query for this report?
Additionally, it would be fantastic if you could assist with the e-commerce purchase report as well.
Thank you for your assistance!
For the first report, you'll want to blend or join (the terminology depends on the reporting tool) the fct_ga4__sessions_daily
and the dim_ga4__sessions_daily
tables on the session_partition_key
.
You'll want either session_default_channel_grouping
or last_non_direct_default_channel_grouping
as your primary dimension along with the session_partition_date
for your date range.
From fct_ga4__sessions_daily
you'll want the following.
sumif(session_number = 1)
sum(session_partition_max_session_engaged)
sum(session_partition_max_session_engaged) / count(*)
sum(session_partition_max_session_engaged) / count(distinct client_key)
sum(session_partition_sum_engagement_time_msec) / count(*)
session_partition_count_page_views
is an acceptable alternativepurchase_count
, for each conversion eventsum(session_partition_sum_event_value_in_usd)
; otherwise this requires customizationTreat the above queryies as pseudo code. The actual query will depend on your reporting tool.
The second report requires customization and it depends on how you set your warehouse up. Take the course I linked, set things up. Try to figure it out yourself, and then share your code.
It will be much easier to help you when you have a base setup.
Hint: You'll want to create a events fact table with all events in it. This table will mostly be used for funnel reporting so you can filter some events that aren't relevant to funnels, like scroll
and user_engagement
. Obviously, you'll want device data in this table and then when reporting you'll use a lot of countif(event_name = 'add_to_cart')
.
Treat this as pseudo code. The actual query will depend on your reporting tool.
Thanks @dgitis!!!
Hey @dgitis,
I wanted to get the report for the page path and the total revenue associated with it. My team decided to add user_pseudo_id to both the fact_ga4__pages and fact_ga4__items_ecommerce tables, and then join them on this field to calculate the total revenue. However, this approach doesn’t seem correct to me can you verify that and If possible, could you please help us identify the fields we could use for these two tables in a way that allows us to accurately calculate the total revenue for each page path after join?
Thanks in advance!
You should be using the client_key
which is based on user_pseudo_id
but is more robust.
The pages model isn't meant to be joined. It's optimized for looking at data by page URL adding an ID to this table defeats the purpose of having a pre-aggregated table.
Instead, you should create a fct_ga4__event_page_view
model with one row per page_view
event and join that to fct_ga4__sessions
on client_key
.
I am currently working on recreating some reports using the modeled schema provided. However, I am encountering some challenges in formulating the SQL queries needed to generate these reports accurately.
Could anyone please share guidance or examples on how to recreate the GA4 reports using SQL queries based on the modeled schema?