feat(): add `ad_reporting` passthrough metrics variable

aleix-cd commented 1 year ago

Are you a current Fivetran customer?

Yes, from a company called Capdesk.

What change(s) does this PR introduce?

Introducing the variable ad_reporting__passthrough_metrics, which will serve as a "common denominator" for those metrics that are present in all of our Ads platforms.

In our case, for example, we like to track conversions both in Linkedin, Google, Microsoft and Facebook. Given that ad_reporting was not providing this column, we had to specifically build some ad_reporting models to just add this column.

With this PR, we should be able to do so by just giving the ad_reporting__passthrough_metrics the names of the metrics we want to add into our final models.

On the other hand, this would require a place in the README.md, in order to let package users know that this is possible (and the caveats as well e.g. that this metric needs to be present, and called the same, in all the Ads platforms we are merging). Let me know if I can provide more help in this regard!

Did you update the CHANGELOG?

[X] Yes

Does this PR introduce a breaking change?

[ ] Yes (please provide breaking change details below.)
[X] No (please provide an explanation as to how the change is non-breaking below.) Shouldn't break anything unless users introduce a metric name that is not found in all platforms' models.

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

[X] Yes

Is this PR in response to a previously created Bug or Feature Request

[ ] Yes, Issue/Feature [link bug/feature number here]
[X] No

How did you test the PR changes?

[ ] Buildkite
[X] Local (please provide additional testing details below) Tested my local branch of the package against a dbt project, everything builds correctly.

Select which warehouse(s) were used to test the PR

[ ] BigQuery
[ ] Redshift
[X] Snowflake
[ ] Postgres
[ ] Databricks
[ ] Other (provide details below)

Provide an emoji that best describes your current mood

:dancer: **Feedback** We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your [feedback](https://www.surveymonkey.com/r/DQ7K7WW) on our existing dbt packages or what you'd like to see next.

fivetran-joemarkiewicz commented 1 year ago

Hi @aleix-cd thanks so much for taking the time to open this PR!

After reviewing your write up and taking a look at your suggested updates, I agree that this is something that I feel would be incredibly valuable to add to the package! There are a few suggested changes I can foresee, but altogether I would like to collaborate with you to integrate this into a future release of our package!

I am going to add this review to my up coming sprint and will share more feedback and suggestions then. Thanks again for thinking through this update and working to contribute a valuable addition to the package!

aleix-cd commented 1 year ago

Since we have individual hierarchy types (account, campaign, ad group, ads) in the platform specific models, I think it would make the most sense to have the same hierarchy type variables for ad reporting. This would give the user control over each of the hierarchies and ensure that the metrics they are trying to passthrough are consistent (and maybe even renamed) across platforms to ensure the unioning works as intended. What are your thoughts?

So instead of just ad_reporting__passthrough_metrics we could have ad_reporting__campaign_passthrough_metrics. Or would you feel that this is duplicative? I just worry there may be some instances where metrics may not be consistent across hierarchies and I would want to isolate these pass through abilities. Thoughts?

Additionally, I have a few suggestions in the review as well. Let me know if you would prefer I elaborate on any of the suggestions.

IMHO it would definitely feel repetitive, but agree that the flexibility it offers should be worth the extra lines. I will push the changes soon, let me know what you think :)

aleix-cd commented 1 year ago

@fivetran-joemarkiewicz At this point, just need to test which is the best way to adapt the get_query() macro to account for the different variable names :) Will give it a thought later today!

aleix-cd commented 1 year ago

@fivetran-joemarkiewicz OK, I got this up and running! Few comments:

metric.name did not work in my end, reverted back to metric. Not sure how you would use it, but strings don't have a name method afaik. Let me know if I am missing something!
Added a few more conditions into the get_query in order to add the relevant passthrough metics at each step. I am no longer adding these in the consistent_fields step, let me know if that looks good!
I am using an else at the end of the get_query macro to cast all passthrough_metrics as floats, so expecting that all metrics can be casted as such {{ dbt.type_float() }}

fivetran-joemarkiewicz commented 1 year ago

Thanks @aleix-cd I will take a look later today!

Regarding your comment:

metric.name did not work in my end, reverted back to metric. Not sure how you would use it, but strings don't have a name method afaik. Let me know if I am missing something!

You are correct and I apologize for not mentioning this in my initial statement. All our other passthrough metrics for Ad Reporting use the dictionary structure as opposed to the list of strings. Therefore, I feel we should follow this method as well to be consistent across the ad platforms. For example it would be

vars:
  ad_reporting__campaign_passthrough_metrics: 
    - name: conversions
    - name: total_shares

As opposed to

vars:
  ad_reporting__campaign_passthrough_metrics: ["conversions","total_shares"]

That being said, I know we have made some enhancements in other packages that support both forms. I will take a look to see how this was done as we may be able to repurpose here as well.

aleix-cd commented 1 year ago

Thanks @aleix-cd I will take a look later today!

Regarding your comment:

metric.name did not work in my end, reverted back to metric. Not sure how you would use it, but strings don't have a name method afaik. Let me know if I am missing something!

You are correct and I apologize for not mentioning this in my initial statement. All our other passthrough metrics for Ad Reporting use the dictionary structure as opposed to the list of strings. Therefore, I feel we should follow this method as well to be consistent across the ad platforms. For example it would be
vars:
  ad_reporting__campaign_passthrough_metrics: 
    - name: conversions
    - name: total_shares
As opposed to
vars:
  ad_reporting__campaign_passthrough_metrics: ["conversions","total_shares"]
That being said, I know we have made some enhancements in other packages that support both forms. I will take a look to see how this was done as we may be able to repurpose here as well.

Ah, my bad! I completely missed that the ad_reporting packages (e.g. google_ads) are using the following fivetran_utils macro to invoque the passthrough metrics: {{ fivetran_utils.persist_pass_through_columns(pass_through_variable='var', transform = 'sum') }}

I could use this instead of the for loop I'm declaring in each model. What do you think?

fivetran-joemarkiewicz commented 1 year ago

@aleix-cd yeah if we could get it to leverage an already existing macro then that would be ideal! Let me know if it doesn't work on your end. Maybe we can look to adjust the macro be more flexible.

aleix-cd commented 1 year ago

@fivetran-joemarkiewicz Alright Joe, this should be ready to review now! I now treat all the variables as Dicts, and use the fivetran_utils.persist_pass_through_columns() macro on all reports. Tested on my end and it worked with the following config:

vars:
  ad_reporting__url_passthrough_metrics: 
    - name: conversions

Let me know!

fivetran-joemarkiewicz commented 1 year ago

@aleix-cd fantastic! Our next sprint starts Thursday. I will review and work to integrate this into a new release in the coming sprint! I will follow up if I have any other comments or suggestions 😄

aleix-cd commented 1 year ago

transform_sql

Spotted the error: I was defining campaign_passthrough_metric_dict as

{% set campaign_passthrough_metric_array_of_dicts = var('ad_reporting__campaign_passthrough_metrics')[0] %}

So only selecting the first element of the array -- in this case, conversions.

I am now iterating over that list of dictionaries:

{% set campaign_passthrough_metric_array_of_dicts = var('ad_reporting__campaign_passthrough_metrics') %}
    {%- for campaign_group_passthrough_metric_dict in campaign_passthrough_metric_array_of_dicts -%}
        {%- for campaign_passthrough_metric_value in campaign_passthrough_metric_dict.values() -%}
            {%- do final_fields_superset.update({campaign_passthrough_metric_value: campaign_passthrough_metric_value}) -%}
        {%- endfor -%}
    {%- endfor -%}

And this should be solved! All passthrough_metrics should now be populating the intermediate models :)

aleix-cd commented 1 year ago

Just applied the requested fixes, @fivetran-joemarkiewicz! Thanks a lot for the tips, appreciate these a lot!

One additional question I wanted to ask was if you had any strong preference for keeping the url_report and ad_report passthrough metrics distinct? I noticed in your code you have a variable for ad_reporting__ad_passthrough_metrics and one for ad_reporting__url_passthrough_metrics; however, in our upstream packages we use the same passthrough variable for both reports since they are built from the same base table. I am considering doing the same here to be consistent across the packages.

That was my bad, I did not realise that the upstream models for ad and url reports were sharing the same variable. I've added the required changes, so now both ad and url reports should be sharing the same passthrough metrics. 👍

aleix-cd commented 1 year ago

We are not using this in prod, so feel free to make any required changes! And thank you for the thorough review, it has been quite a good learning experience! 💯

fivetran-joemarkiewicz commented 1 year ago

@aleix-cd just wanted to provide a quick update that there seem to be a few unrelated BigQuery JSON issues that are occurring in the upstream Ad packages. These just popped up following an update from BigQuery and we are currently triaging these issues.

Ultimately, we will be applying fixes to our upstream packages in the coming sprint. These changes will be breaking changes for the upstream packages. Therefore, we are planning to batch this PR with the next round of breaking changes so we don't cut breaking changes back to back.

I appreciate your patience!

fivetran-joemarkiewicz commented 1 year ago

The upstream changes are in motion and we can move forward with merging this into the release branch! My plan is to release this early next week!

fivetran / dbt_ad_reporting

feat(): add `ad_reporting` passthrough metrics variable #84