fivetran / dbt_microsoft_ads

Fivetran data transformations for Microsoft (Bing) Ads built using dbt.
https://fivetran.github.io/dbt_microsoft_ads/
Apache License 2.0
3 stars 6 forks source link

[Bug] Null records in the Ad Group Report Model #18

Closed paritushPXGContractor closed 2 years ago

paritushPXGContractor commented 2 years ago

Is there an existing issue for this?

Describe the issue

We are encountering null values in "ad_group_id" column after "microsoft_adsad_group_report" runs. After investigating we found out it uses 2 tables for "ad_group_id" column and its a left join between "stg_microsoft_adsad_group_daily_report" & "stg_microsoft_adsad_group_history". Both of these tables didn't have any null values in them. Also, in the source left table "stg_microsoft_adsad_group_daily_report", we have more than 300,000 records, and after the left join, we are left with around 150,000 records with 1700 records of nulls in the final table microsoft_ads__ad_group_report.

Relevant error log or model output

12:33:47  9 of 118 START test not_null_ad_reporting__ad_group_report_ad_group_id ......... [RUN]
12:33:49  9 of 118 FAIL 419 not_null_ad_reporting__ad_group_report_ad_group_id ........... [FAIL 419 in 1.54s]

Expected behavior

There shouldn't be any nulls and the rows count ideally shouldn't change from the left table in a left join.

dbt Project configurations

name: 'microsoft_ads'
version: '0.5.0'
config-version: 2
require-dbt-version: [">=1.0.0", "<2.0.0"]

vars:
  microsoft_ads:
    account_history: "{{ ref('stg_microsoft_ads__account_history') }}"
    account_performance_daily_report: "{{ ref('stg_microsoft_ads__account_daily_report') }}"
    campaign_history: "{{ ref('stg_microsoft_ads__campaign_history') }}"
    campaign_performance_daily_report: "{{ ref('stg_microsoft_ads__campaign_daily_report') }}"
    ad_group_history: "{{ ref('stg_microsoft_ads__ad_group_history') }}"
    ad_group_performance_daily_report: "{{ ref('stg_microsoft_ads__ad_group_daily_report') }}"
    ad_history: "{{ ref('stg_microsoft_ads__ad_history') }}"
    ad_performance_daily_report: "{{ ref('stg_microsoft_ads__ad_daily_report') }}"
    keyword_history: "{{ ref('stg_microsoft_ads__keyword_history') }}"
    keyword_performance_daily_report: "{{ ref('stg_microsoft_ads__keyword_daily_report') }}"
    search_performance_daily_report: "{{ ref('stg_microsoft_ads__search_daily_report') }}"

  microsoft_ads_auto_tagging_enabled: false

  microsoft_ads__account_report_passthrough_metrics: []
  microsoft_ads__campaign_report_passthrough_metrics: []
  microsoft_ads__ad_group_report_passthrough_metrics: []
  microsoft_ads__ad_report_passthrough_metrics: []
  microsoft_ads__keyword_report_passthrough_metrics: []
  microsoft_ads__search_report_passthrough_metrics: []

models:
  microsoft_ads:
    +schema: microsoft_ads
    +materialized: table

Package versions

packages:

What database are you using dbt with?

redshift

dbt Version

DBT version 1.2

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @paritushPXGContractor thanks for opening this bug report.

After looking through your report I would agree that there should not be any null ad_group_id records in your ad group report model. Upon looking further into this issue I am quite perplexed as to where these null records are coming from. I imagine it must be that there are some ad_group_ids that are included in the daily report that are not included in the ad group history table. 🤔

When running this package locally, if you navigate to your dbt_packages/microsoft_ads/models/microsoft_ads__ad_group_report.sql file and edit the joined cte to pull the ad_group_id from the report instead of the ad group history model, you should be able to see which ad_group_ids are causing the issues.

Note: You will not want to re-run dbt deps after editing this file in your dbt_packages folder. Or else you will delete your changes

joined as (

    select
        date_day,
        accounts.account_name,
        accounts.account_id,
        campaigns.campaign_name,
        campaigns.campaign_id,
        ad_groups.ad_group_name,
        report.ad_group_id, -- This was changed from ad_groups.ad_group_id
        report.device_os,
        report.device_type,
        report.network,
        report.currency_code,
        sum(clicks) as clicks,
        sum(impressions) as impressions,
        sum(spend) as spend

        {{ fivetran_utils.persist_pass_through_columns(pass_through_variable='microsoft_ads__ad_group_passthrough_metrics', transform = 'sum') }}
    from report
    left join accounts
        on report.account_id = accounts.account_id
    left join campaigns
        on report.campaign_id = campaigns.campaign_id
    left join ad_groups
        on report.ad_group_id = ad_groups.ad_group_id
    {{ dbt_utils.group_by(11)}}
)

I imagine when making this change you will see the test should pass, but the ad_group_name for a number of records will still be null. You should be able to narrow in from there on why these ad groups may not be populated in the history table.

Additionally, the row count of the daily report source table and our output microsoft_ads__ad_group_report model is due to the level of aggregation. We are aggregating on a number of fields that the source table doesn't; therefore, the row count of our end model will be less than the source.