fivetran / dbt_apple_search_ads

Fivetran's Apple Search Ads dbt package
https://fivetran.github.io/dbt_apple_search_ads/#!/overview
Apache License 2.0
0 stars 2 forks source link

[Bug] <dbt test fail for new ad_reporting package version: [">=1.3.0", "<1.4.0"] dbt_utils_unique_combination_of_columns apple_search_ads__search_term_report> #16

Closed yuna-tang closed 1 year ago

yuna-tang commented 1 year ago

Is there an existing issue for this?

Describe the issue

Issue

We upgraded the dbt package:fivetran/ad_reporting from version: [">=0.7.0", "<0.8.0"] to version: [">=1.3.0", "<1.4.0"]. There is a new dbt_utils_unique_combination_of_columns_apple_search_ads__search_term_report_search_term_text__date_day functiontest enforced in the new version against apple_search_ads__search_term_report. This test detects the duplicated records for the same search_term_textand date_day combination. If the tests fail, the dbt build will fail.

I have run the test SQL script against the older fivetran/ad_reporting version: [">=0.7.0", "<0.8.0"]. The old version appears to have duplicated search_term_textand date_day combinations for the same day, but the test is not enforced.

select
    search_term_text, date, count(*) as duplicate
from  <your_test_db>."apple_search_ads"."search_term_report."
group by search_term_text, date
having count(*) > 1
order by date desc;

When we run the test SQL script against the updated package Fivetran/ad_reporting version: [">=1.3.0", "<1.4.0"], there are duplicatedsearch_term_text anddate_day` combinations for the same day, and the test of dbt_utils_unique_combination_of_columns for apple_search_ads__search_term_report is enforced under new package.

with validation_errors as (

    select
        search_term_text, date_day, count(*) as duplicate
    from "<your_test_db>"."apple_search_ads"."apple_search_ads__search_term_report."
    group by search_term_text, date_day
    having count(*) > 1

)

select *
from validation_errors order by date_day desc;

Steps to reproduce

  1. install the old Fivetran/ad_reporting version: [">=0.7.0", "<0.8.0"] using dbt deps
  2. run a query against
    select
    search_term_text, date, count(*) as duplicate
    from  <your_test_db>."apple_search_ads"."search_term_report"
    group by search_term_text, date
    having count(*) > 1
    order by date desc;
  3. install the new `Fivetran/ad_reporting version: [">=1.3.0", "<1.4.0"] using dbt deps
  4. run a query against

    
    with validation_errors as (
    
    select
        search_term_text, date_day, count(*) as duplicate
    from "<your_test_db>"."apple_search_ads"."apple_search_ads__search_term_report"
    group by search_term_text, date_day
    having count(*) > 1

)

select * from validation_errors order by date_day desc;


### Relevant error log or model output

```shell
For `Fivetran/ad_reporting version: [">=0.7.0", "<0.8.0"]` 
We have duplicated combinations for search_term_text and date_day that appear for the same day.
i.e. search_term_text='null', date_day='2023-06-26', this combination appears 33 times for the same date.

For `Fivetran/ad_reporting version: [">=1.3.0", "<1.4.0"]` 
We have duplicated combinations for search_term_text and date_day that appear for the same day.
i.e. search_term_text='safetyculture', date_day='2023-06-12', this combination appears twice for the same date.

Expected behavior

I expect only one line for each search_term_text & date_day record for apple_search_ads_search_term_report. Currently, it is showing with different ad_group_id and keyword_id in table apple_search_ads_search_term_report. Maybe we should sue the keyword_text & date_day for the unique column combination instead of search_term_text.

dbt Project configurations

To prevent the dbt build fail, we need to config the following in dbt_project.yml. Below is the only configuration relates to apply_search_ads

vars:

  apple_search_ads__using_search_terms: False # by default this is assumed to be True

Package versions

-- older package packages:

--new package packages:

What database are you using dbt with?

redshift

dbt Version

Forfivetran/ad_reporting version: [">=0.7.0", "<0.8.0"], the dbt version is 1.2 Forfivetran/ad_reporting version:[">=1.3.0", "<1.4.0"].`, the dbt version 1.3

Additional Context

image image

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 1 year ago

Hi @yuna-tang thanks for opening this issue and sorry to see you are experiencing the failure currently due to the upgrade. If you wish, for the time being you can always disable tests directly in your dbt_project.yml while our team works on a fix.

Regarding the issue and solution I believe you helped isolate what is happening. We are testing the uniqueness on only the search_term and the date_day. However, it would make sense that the same search_term could exist across multiple ad_group_ids. This seems to be the case for you and likely others. I would prefer we keep the search term report separate from the keyword report, and keep the structure of the end model to be consistent with others. Additionally, keeping the report at the account/campaign/ad_group/search_term/day grain ensures users can slice and dice how they see fit.

For the solution we will update the test to include the ad_group_id in the unique combination of columns. This will more accurately reflect the data in the source and should ensure success for users who have the same search term across ad groups. My team will pick this up in our current sprint and we will share updates as we work through this fix. Thanks again!

fivetran-joemarkiewicz commented 1 year ago

Actually, when looking into our tests for the Ad Reporting Search Term Report I see we are in fact testing on the keyword_id uniqueness.

https://github.com/fivetran/dbt_ad_reporting/blob/cb0738c6c5b9b7bf2c8598638816d3401ae06fc0/models/ad_reporting_models.yml#L183-L195

I would propose our team updates the uniqueness test to match the ad reporting end model as closely as possible.

yuna-tang commented 1 year ago

Hi @fivetran-joemarkiewicz, Thank you for the updates! We are currently disabling this test during our upgrade testing as a workaround. I am looking forward to the fix! 😄 I wanted to contribute to the PR, but the office hours could be more Sydney timezone friendly. 😅

fivetran-joemarkiewicz commented 1 year ago

Great! Also, you are more than welcome to open a PR outside of the office hours if you would like! Especially with an update to the test that could be done in the single file. If you want to make the edit and open the PR my team can take it from there 😄

yuna-tang commented 1 year ago

@fivetran-joemarkiewicz I have raised a PR and tested the combination against our data warehouse. There are no duplicates with the new combination. I have tried following the templates; please let me know if I need to update anything else. 😄

fivetran-reneeli commented 1 year ago

PR has been merged to main so will be closing this out!