google-marketing-solutions / ga4_dataform

Apache License 2.0
96 stars 33 forks source link

bing/cpc missing from session tables #5

Closed gldsv closed 4 hours ago

gldsv commented 1 month ago

Hi, when comparing a simple source acquisition report using this query below and comparing to a GA4 UI Explore on the same perimeter, I notice the source/medium bing/cpc is missing from the ga4_dataform_output data.

SELECT
  last_non_direct_traffic_source.source,
  last_non_direct_traffic_source.medium,
  COUNT(*) AS nb_sessions
FROM `useful-foundry-132723.ga4_dataform_output.session` 
WHERE session_start_date BETWEEN "2024-04-01" AND "2024-04-30"
GROUP BY 1,2
ORDER BY 3 DESC

The query outputs this: image

The GA4 UI Explore reports this: image

piariachi commented 1 month ago

Hi @gldsv thanks for raising this, from your data, it seems that bing/cpc might have been allocated to Google/cpc. Note: While it might be on the roadmap for GA4 BQ exports, cpc traffic sources are still not tracked by default in the exports, so this solution attempts a workaround function to overcome this (you can find it in helpers.js >updatePaidSearchTrafficSource) Question: could you check if the rows that should be corresponding to bing/cpc have values in the 'collected_traffic_source.gclid' column? I also just pushed a possible fix, please give it a try.

gldsv commented 1 month ago

Hi, thanks for your reply!

I confirm the rows which should be bing/cpc but allocated to google/cpc have a gclid value.

I tried the fix, but it seems to generate a new issue, now almost all traffic becomes google/organic (same query than before)

image

I used to solve the issue for events with gclid wrongly labelled using the following SQL (because it seems that this problem only affects google/cpc traffic, as bing/cpc do have the proper label)

IF(collected_traffic_source.gclid IS NOT NULL AND (collected_traffic_source.manual_medium IS NULL OR collected_traffic_source.manual_medium != "cpc"), "google", collected_traffic_source.manual_source) AS manual_source,
IF(collected_traffic_source.gclid IS NOT NULL AND (collected_traffic_source.manual_medium IS NULL OR collected_traffic_source.manual_medium != "cpc"), "cpc", collected_traffic_source.manual_medium) AS manual_medium,

Tell me if you need any additional info!

Have a nice day!

piariachi commented 1 month ago

Hey, correct bing/cpc do have the proper label, my earlier logic had a typo, could you try one more time please and share with me the new results? If that doesn't sort it , I will revert the last 2 commits and re-think the function logic. This would take a few days. Thanks

matt-brood commented 1 month ago

If it helps, I found that gclids appear for more than just Google CPC clicks (keeping in mind, this may be an edge case and specific to our setup). For example, if you're using SA360 to manage Bing and you've set up the integration with GA4, then Bing CPC clicks may also have a gclid. So the code below will re-categorise Bing CPC as Google CPC in that scenario.

IF(collected_traffic_source.gclid IS NOT NULL AND (collected_traffic_source.manual_medium IS NULL OR collected_traffic_source.manual_medium != "cpc"), "google", collected_traffic_source.manual_source) AS manual_source,
IF(collected_traffic_source.gclid IS NOT NULL AND (collected_traffic_source.manual_medium IS NULL OR collected_traffic_source.manual_medium != "cpc"), "cpc", collected_traffic_source.manual_medium) AS manual_medium,

But I also found (keeping in mind again, this could be related to our specific setup) that gclids were appearing for other non-Google/Bing sources and the gclid fallback was too heavy-handed. So now I've completely removed the gclid fallback from our model and gone down the route of using the gclids to join missing source/medium/campaign data only from Google Ads and SA360 datasets.

piariachi commented 1 month ago

Makes absolute sense Matt! I can confirm that this happens when using SA360 to manage bing. Thanks for sharing the logic here.

On Tue, May 28, 2024, 7:54 PM Matt Bennett @.***> wrote:

If it helps, I found that gclids appear for more than just Google CPC clicks (keeping in mind, this may be an edge case and specific to our setup). For example, if you're using SA360 to manage Bing and you've set up the integration with GA4, then Bing CPC clicks may also have a gclid. So the code below will re-categorise Bing CPC as Google CPC in that scenario.

IF(collected_traffic_source.gclid IS NOT NULL AND (collected_traffic_source.manual_medium IS NULL OR collected_traffic_source.manual_medium != "cpc"), "google", collected_traffic_source.manual_source) AS manual_source, IF(collected_traffic_source.gclid IS NOT NULL AND (collected_traffic_source.manual_medium IS NULL OR collected_traffic_source.manual_medium != "cpc"), "cpc", collected_traffic_source.manual_medium) AS manual_medium,

But I also found (keeping in mind again, this could be related to our specific setup) that gclids were appearing for other non-Google/Bing sources and the gclid fallback was too heavy-handed. So now I've completely removed the gclid fallback from our model and gone down the route of using the gclids to join missing source/medium/campaign data only from Google Ads and SA360 datasets.

— Reply to this email directly, view it on GitHub https://github.com/google-marketing-solutions/ga4_dataform/issues/5#issuecomment-2135818324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVTIGYMSPOLEKMER3L4NVLZETAGHAVCNFSM6AAAAABIGADFQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVHAYTQMZSGQ . You are receiving this because you commented.Message ID: @.***>