fivetran / dbt_ad_reporting

Fivetran's ad reporting dbt package. Combine your Facebook, Google, Pinterest, LinkedIn, Twitter, Snapchat, Microsoft, TikTok, Reddit, Amazon, and Apple Search advertising metrics using this package.
https://fivetran.github.io/dbt_ad_reporting/#!/overview
Apache License 2.0
142 stars 56 forks source link

[Bug] failed keyword_id tests because of missing filter/inner join #63

Closed clay-walker closed 1 year ago

clay-walker commented 1 year ago

Is there an existing issue for this?

Describe the issue

The microsoft_ads__keyword_report model uses the daily keyword report as its base and left joins to the keyword history table on keyword_id. There is a subsequent "not null" test for keyword ID, which can fail (since keyword history is left joined).

It is possible for keyword_id values to exist in the daily report and not in the keyword history table. This happens when a keyword is deleted (keyword_status = 'Deleted' in the daily table).

For Example:

image

I've investigated this issue and have come up with two solutions:

  1. Add a filter in microsoft_ads__keyword_report (this requires a change to get_keyword_daily_report_columns.sql and stg_microsoft_ads__keyword_daily_report_tmp.sql): where report.keyword_status != 'Deleted'

  2. Alternatively, change microsoft_ads__keyword_report to inner join with keyword history. This is the simplest, and imo preferred solution. As you can see, this inner join will only exclude deleted records:

image

Relevant error log or model output

test failures in multiple keyword models

Expected behavior

tests pass without failure

dbt Project configurations

vars: prod_database: analytics prod_schema: business operdb_schema_pattern: 'operdb%' raw_data_db: GONG "dbt_date:time_zone": "America/Los_Angeles" timezone_constant: "America/Los_Angeles"

ad_reportingapple_search_ads_enabled: False # by default this is assumed to be True ad_reportingpinterest_ads_enabled: False # by default this is assumed to be True ad_reportingtwitter_ads_enabled: False # by default this is assumed to be True ad_reporting__facebook_ads_enabled: False # by default this is assumed to be True ad_reportingsnapchat_ads_enabled: False # by default this is assumed to be True ad_reporting__tiktok_ads_enabled: False # by default this is assumed to be True

linkedin_ads_schema: linkedin_ads linkedin_ads_database: source_raw

google_ads_schema: google_ads google_ads_database: source_raw

microsoft_ads_schema: bingads microsoft_ads_database: source_raw

Package versions

packages:

What database are you using dbt with?

snowflake

dbt Version

1.0.1

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-sheringuyen commented 1 year ago

Hello @clay-walker, thank you so much for creating this issue (as well as #64)!

We've started working on a PR that should mitigate these issues here, however, this PR has a dependency on the dbt_utils 1.0.0 release. So in the meantime, we've actually created a separate branch off of main so that you can test the model logic. Would you mind testing this branch of microsoft_ads out and see if you are still seeing the same issues?

All you will need to do is change your dbt_ad_reporting/packages.yml to reference the branch above like so:

packages:
  - package: fivetran/apple_search_ads
    version: [">=0.1.0", "<0.2.0"]

  - package: fivetran/snapchat_ads
    version: [">=0.4.0", "<0.5.0"]

  - package: fivetran/facebook_ads
    version: [">=0.5.0", "<0.6.0"]

  - package: fivetran/google_ads
    version: [">=0.8.0", "<0.9.0"]

  - package: fivetran/pinterest
    version: [">=0.6.0", "<0.7.0"]

  - package: fivetran/linkedin
    version: [">=0.5.0", "<0.6.0"]

  # - package: fivetran/microsoft_ads
  #   version: [">=0.5.0", "<0.6.0"]
  - git: https://github.com/fivetran/dbt_microsoft_ads.git
    revision: test-bugfix/search-uniqueness-hard-deletes
    warn-unpinned: false

  - package: fivetran/tiktok_ads
    version: [">=0.2.0", "<0.3.0"]

  - package: fivetran/twitter_ads
    version: [">=0.5.0", "<0.6.0"]
clay-walker commented 1 year ago

@fivetran-sheringuyen Absolutely, thanks for the response! I'll let you know the results.

clay-walker commented 1 year ago

Hi @fivetran-sheringuyen I'm still getting a null keyword ID error in the microsoft ads keyword report (in addition to others):

image

fivetran-jamie commented 1 year ago

hey @clay-walker it looks like your project isn't picking up the new changes. i've made a testing branch of _adreporting that references the the new microsoft ads code.

so, in your root project's packages.yml, could you replace any ad_reporting lines with:

packages:
  - git: https://github.com/fivetran/dbt_ad_reporting.git
    revision: test-bugfix/search-uniqueness-hard-deletes
    warn-unpinned: false

and then dbt clean, deps, run, and test? thanks!

fivetran-jamie commented 1 year ago

@clay-walker do you think you'll have a chance to test using this branch ^ ? thanks šŸ¤ 

nszoni commented 1 year ago

hi @fivetran-jamie @clay-walker! Any updates on this? Do you need me testing it perhaps?

clay-walker commented 1 year ago

Hi @fivetran-jamie @nszoni

Sorry both, busy week.

I'm getting test errors for keyword_text:

FAIL 1503 not_null_ad_reporting__keyword_report_keyword_text.......... [FAIL 1503 in 3.96s]

FAIL 410 dbt_utils_unique_combination_of_columns_ad_reportingkeyword_report_platformdate_daykeyword_textkeyword_match_typead_group_id__campaign_idaccount_id [FAIL 410 in 3.00s]

Incidentally, and this may be something to raise with @fivetran-sheringuyen elsewhere, but...

We're using dbt-snowflake CLI version 1.0.0 (not cloud unfortunately), which does not support metrics.

I noticed you added metrics to the ad_reporting package. We are trying to get our security team to approve the upgrade to 1.3 in our CI/CD process, but it might be a struggle. Just wondering if you would consider this package to have breaking changes for < dbt 1.3, or is there a way to configure with a variable, aside from just removing the metrics yml code?

fivetran-jamie commented 1 year ago

hey @clay-walker no worries! i'll look into the keyword_text issue later today and will circle back with any findings/followup questions

regarding metrics, are you getting an error around the metrics yml file being in there? i think previous versions of dbt should ignore any metrics code

nszoni commented 1 year ago

Hi @fivetran-jamie @clay-walker! Just checking if you have any updated on this. We would be super grateful if you could solve this so we can start using the new version of the package:) Thanks!

fivetran-joemarkiewicz commented 1 year ago

I @nszoni thanks for jumping into the thread! The team and I actually believe we found a path forward to address this issue!

I will be planning to work to integrate the change early next week and aim for an early next week patch release.

clay-walker commented 1 year ago

hey @clay-walker no worries! i'll look into the keyword_text issue later today and will circle back with any findings/followup questions

regarding metrics, are you getting an error around the metrics yml file being in there? i think previous versions of dbt should ignore any metrics code

Hey @fivetran-jamie prior versions do not ignore the metrics yml config, unfortunately.

nszoni commented 1 year ago

Thanks @fivetran-jamie! Looking forward to hearing from you!

fivetran-joemarkiewicz commented 1 year ago

Regarding the Keyword Errors

I have been able to isolate the errors that both of you (@nszoni and @clay-walker) are following this thread regarding. Essentially, an update was needed to both the Twitter and Microsoft packages to account for nuances with the keyword reports we initially did not account for in the v1.0.0 release of the Ad Reporting package. I have tested the following branch of the ad_reporting package locally and saw errors to be resolved.

Would you both be open to testing the branch out on your end? If it works for both of you, we can move forward with releasing a patch update the the Ad packages to resolve this error.

packages:
  - git: https://github.com/fivetran/dbt_ad_reporting.git
    revision: bugfix/twitter-microsoft-keywords-search
    warn-unpinned: false

Note: you will need to run a dbt clean and a dbt deps to pull in the changes from this branch above. Additionally, please be aware that this branch has dbt-utils v0.8.6 as a dependency. If you are on dbt-utils v0.9.0 or greater, it will result in a dependency error.

Regarding the Metrics Error

@clay-walker I am curious that you are seeing an error on your end with the metrics config included in a previous dbt version. I would have assumed dbt would ignore this file if it was not yet compatible. This may be something we could get @callum-mcdata eyes on as to if this is an expected behavior with pre metrics compatible versions of dbt-core. šŸ¤”

clay-walker commented 1 year ago

@fivetran-joemarkiewicz

Thanks for the update. I used your latest revision, and I'm still getting a null keyword error. I traced it back to the source data, and it appears there are keyword IDs in the keyword_performance_daily_report which do not exist in the keyword_history table.

image

fivetran-joemarkiewicz commented 1 year ago

Hey @clay-walker I am curious that you are still getting the error using the branch I provided above. We adjusted the model to pull the keyword_id from the keyword_performance_daily_report table which should result in no null records. Unless are there null keyword_id records in the source keyword_performance_daily_report?

callum-mcdata commented 1 year ago

@clay-walker šŸ‘‹ hey there! Not to muddy the waters here but could you provide the error message you are receiving that is letting you know dbt-core 1.0.0 doesn't allow for you to have the metrics defined? I haven't been able to recreate this error on my local - adding arbitrary top level node configs doesn't trip anything and it just leads to them being ignored šŸ¤”

clay-walker commented 1 year ago

@fivetran-joemarkiewicz

I replaced the root packages entry for ad reporting: image

Then ran dbt clean, dbt deps, dbt run, dbt test.

This is what I'm seeing in microsoft_ads__keyword_report.sql:

image

clay-walker commented 1 year ago

@callum-mcdata here ya go! Compilation error. If I erase the contents of the file with metrics defined, everything runs smoothly. And apologies, I'm on 1.0.1, not 1.0.0. :)

image

fivetran-joemarkiewicz commented 1 year ago

@clay-walker thanks for sharing and looks like the proper version of the package is being used in your recent runs!

Still curious why this would be resulting in null keyword_ids since we are left joining on the history model. I would expect the raw report table to not contain null records. If you query the following in your BigQuery project, do you see any nulls?

select *
from source_raw.bingads.keyword_performance_daily_report
where keyword_id is null
callum-mcdata commented 1 year ago

@clay-walker Ohhh interesting, it's not the metric config that's throwing the error as much as it is the derived metrics built on other metrics trying to reference the metric function and dbt not finding it šŸ¤” . BTW if you're on v1.0.1 then metrics technically exist in your version - the experimental launch was with v1 so there shouldn't be any issue there!

However the ability to ref a metric was added in v1.2.0 in order to support derived metrics. Unfortunately I don't think this is something that we can resolve in v1.0.1 šŸ˜¢ @fivetran-joemarkiewicz

clay-walker commented 1 year ago

@fivetran-joemarkiewicz the problem is the keyword history table, see my query screenshot above. Keyword IDs appear in the daily report, but do not exist in the keyword history table. Since you're pulling the keyword name (later aliased to text) from the keyword history table, it's throwing a null keyword text error in tests.

image

But to answer your question, there are no null keyword IDs in the keyword_performance_daily_report.

clay-walker commented 1 year ago

Thanks @callum-mcdata, makes sense. Honestly we need to upgrade, so I'll pursue that with our security and devops teams. šŸ„²

fivetran-joemarkiewicz commented 1 year ago

@clay-walker my apologies I was focusing on the keyword_id and not paying enough attention to the keyword_text. In this case, I actually believe we will want to remove the not_null test on the keyword_text field within the ad reporting model.

In this case, we still want to report on the keyword as it did accrue spend. I know you mentioned in the opening of this issue that an inner join would suffice work to remove the nulls. However, are you concerned with missing out on the spend attached to these keywords? Additionally, are you concerned in this case with the inclusion of null records as the history of the keywords have been deleted? If so, then we can remove the test and this should resolve the issue!

@callum-mcdata thanks for looking into this as I didn't know about this version restriction with the derived metrics. This will hopefully be something we can remedy in the next breaking change with increasing the minimum dbt-core version requirement.

clay-walker commented 1 year ago

@fivetran-joemarkiewicz In retrospect, my initial suggestion to use an inner join makes no sense. I'm good with removing the test. Thank you!

fivetran-joemarkiewicz commented 1 year ago

@clay-walker sounds good! I just updated the branch to test on the keyword_id for not nulls as opposed to the text. When you have a moment, let me know if the update to the branch resolves your errors. šŸ˜„

nszoni commented 1 year ago

Hey @fivetran-jamie and @clay-walker!

I have tested the branch but still have 1 failing test on the unique combination of columns.

packages:
  - git: https://github.com/fivetran/dbt_ad_reporting.git
    revision: bugfix/twitter-microsoft-keywords-search
    warn-unpinned: false
  - name: ad_reporting__keyword_report
    description: Each record represents daily metrics by keyword, ad group, campaign and account.
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - platform
            - date_day
            - keyword_text
            - keyword_match_type
            - ad_group_id
            - campaign_id
            - account_id
09:48:53  Failure in test dbt_utils_unique_combination_of_columns_ad_reporting__keyword_report_platform__date_day__keyword_text__keyword_match_type__ad_group_id__campaign_id__account_id (models/ad_reporting_models.yml)
09:48:53    Got 70 results, configured to fail if != 0
09:48:53  
09:48:53    compiled SQL at target/compiled/ad_reporting/models/ad_reporting_models.yml/dbt_utils_unique_combination_o_f1b58d2a0be6cb3ab97cee07f3247bfb.sql

3438

fivetran-joemarkiewicz commented 1 year ago

HI @nszoni thanks for testing out the working branch and providing the feedback that you still encountered a uniqueness error.

After looking at the error you are seeing, it makes sense why you are still seeing this. With the updates I applied to the package branch, I should have also added the keyword_id to the uniqueness test for the final ad reporting model since we added the keyword_id to the Twitter package.

I just pushed a fix for this. Would you be able to try again and let me know if you are still experiencing the test failure.

nszoni commented 1 year ago

Hey @fivetran-joemarkiewicz, just tested the branch, and it works fine now! Thanks for fixing it!:)

fivetran-joemarkiewicz commented 1 year ago

That's great to hear @nszoni! I will give my team the go ahead to review the open PRs above and we can move forward with rolling these updates out into the packages!

fivetran-joemarkiewicz commented 1 year ago

Hi @nszoni and @clay-walker the latest release of the ad_reporting package (v1.0.4) has the updates from the branch included within it! This should address the issues you were both facing. Feel free to upgrade and let me know if you run into any issues.

For the time being I will close this ticket, but feel free to reopen it or create a new issue if you experience any additional issues. I really appreciate both of your help in resolving this issue šŸ˜„