fivetran / dbt_snapchat_ads

Fivetran data models for Snapchat Ads built using dbt.
https://fivetran.github.io/dbt_snapchat_ads
Apache License 2.0
1 stars 6 forks source link

Null records introduced #16

Closed caubuchon closed 6 months ago

caubuchon commented 1 year ago

I was running into an error with the dbt_snapchat_ads utility. For some reason, an extra 6 records for only six days out of 3 years of data was being created with a NULL in the ad_account_id. There are no NULLs in the AD_ACCOUNT_ID fields in the account nor campaign CTE. While i am not sure why the NULLS were introduced. These NULLS were tripping up the not-null tests. My quick fix for this issue was to append "WHERE AD_ACCOUNT_ID IS NOT NULL" at the bottom of snapchat_ads_account_report.sql I'd love to learn why the extra records are happening, but hopefully I help address an issue before others struggle with it. Thanks -Chris Records before image Code change

image

Records after image

fivetran-joemarkiewicz commented 1 year ago

Hi @caubuchon thanks for opening this issue. My only thought is that these records from the account_report (generated from the ad_report_hourly source table) have ads that are not associated with a particular account_id. Or more likely, the account_id that these ads did belong to was an older version.

In our code we select the account_ids from the ad_account_history staging model where the records are the most recent.

https://github.com/fivetran/dbt_snapchat_ads/blob/02656ffb2a17e9f339dafd0d80790fdf4ac6371d/models/snapchat_ads__account_report.sql#L8-L12

Because of this, I wonder if we are filtering out a previous account ID that is not being captured in the report. I would be cautious to filter these null records out in the model as I imagine the spend, impression, and swipes that are being displayed did in fact happen. They just were applied to an account_id that is now out of date or has a more current version.

Would you be able to confirm if this seems to be the case for you?

fivetran-jamie commented 1 year ago

so it looks like this is another case of the Ads Platform hard-deleting entities from history tables but NOT from the reporting tables. Specifically, we found cases where an ad_id was not found in ad_history but was still in the ad_hourly_report source table. Thus, everything we join with the ad_hourly data here is NULL.

@caubuchon and I discussed this a bit - as there can still be non-zero impressions/spend values for the deleted-ads, we probably want to still leave their records from ad_hourly_report in there. What we do want to change, however, is the not_null tests we place on the ad_account_id.

i propose leaving the test but changing the severity to a warning, as this shouldn't derail a whole production pipeline, but it is still worth calling out. we can include instructions for how to disable these if the warnings get annoying...

i don't think we need to apply these changes to any other end models as the account_report is the only model in which we have a not_null test on a field that is joined in (since the other end models don't need to roll up ad_hourly_report, they have campaign_hourly_report and ad_squad_hourly_report and test the IDs from those models for nullness) cc @fivetran-joemarkiewicz

fivetran-joemarkiewicz commented 1 year ago

Thanks for the thorough explanation and detailed outline of the proposed solution as discussed with @caubuchon. I agree with both of you that it would be best to not filter these fields out, but adjusting the severity of the test to a warn seems to be the appropriate path forward. I will plan to include this in our next sprint (starting in two weeks) so we can add the adjustment.

However, if @caubuchon or anyone else comes across this and wants to contibute to the codebase, you are more than welcome to open a PR and my team can prioritize reviewing and folding into a patch update. The code that likely would need to be addressed is the following where we would just add a severity config to warn instead of fail

https://github.com/fivetran/dbt_snapchat_ads/blob/02656ffb2a17e9f339dafd0d80790fdf4ac6371d/models/snapchat.yml#L6-L10

For the time being, you may use the following config in your root dbt_project.yml to disable the test:

tests:
  snapchat_ads:
    not_null_snapchat_ads__account_report_ad_account_id:
      +enabled: false
bthomson22 commented 11 months ago

Wanted to chime in here and note this also affects the account_id not_null test in ad_reporting__account_report. I ran into this issue in Snapchat & Ad Reporting today. This can be depressed using a similar config as above:

tests:
  ad_reporting:
    not_null_ad_reporting__account_report_account_id:
      +enabled: false

Will open up a PR to resolve.

fivetran-jamie commented 6 months ago

hi @bthomson22 @caubuchon apologies for the delay, but we're finally able to prioritize pushing out this fix this sprint. Brandon i'll go ahead and review/merge your PR and apply the analogous updates to Ad Reporting

fivetran-jamie commented 6 months ago

this fix is live in v0.6.2 of snapchat_ads! (and v1.7.1 of ad_reporting for everyone using that as well)