Open tessa-beijloos opened 2 months ago
Yes, the events from each of those properties will each have a unique stream_id
Thanks! But does that mean that we still first combine the datasets from both properties? I am using the bigquery connection and if i create the base table the query costs will be a lot higher if i first combine the tables and after that select different streaming ids to seperate them
It's up to you if you'd like 3 separate projects for 3 properties or 1 larger project with 3 properties. You can either process and analyze them independently or jointly. In either case, all data is partitioned on date, so you can limit costs by including date filters.
Hopefully that helps.
I suspect it would also help save cost if the base table creation in dbt_packages/ga4/models/staging/base/base_ga4__events.sql
would be clustered on stream_id. What do you think @adamribaudo-velir ?
@DVDH-000 It's currently clustered on event_name
. Whether clustering on event_name
or stream_id
is more performant likely depends on whether the user plans to analyze across streams or within a single stream which we can't predict.
I'm pretty certain that config setting can be overridden in the project yaml. And if anyone wants to run some empirical tests to demonstrate which is more performant under various scenarios, by all means :)
We can select multiple properties like this:
And it then combines the dataset into one (with combined_property_data()). I need a way to analyze them seperately in the same project. Is there a workaround for this? Thanks in advance!