fivetran / dbt_ad_reporting

Fivetran's ad reporting dbt package. Combine your Facebook, Google, Pinterest, LinkedIn, Twitter, Snapchat, Microsoft, TikTok, Reddit, Amazon, and Apple Search advertising metrics using this package.
https://fivetran.github.io/dbt_ad_reporting/#!/overview
Apache License 2.0
143 stars 55 forks source link

update metrics spec #100

Closed Jstein77 closed 1 year ago

Jstein77 commented 1 year ago

Please provide your name and company Jordan Stein, dbt Labs

Link the issue/feature request which this PR is meant to address

https://github.com/fivetran/dbt_ad_reporting/issues/101

Detail what changes this PR introduces and how this addresses the issue/feature request linked above. This PR updates the metrics defined in the project to the new spec supported in dbt-core 1.6. It also adds semantic models, which are a new resource in 1.6

How did you validate the changes introduced within this PR? Confirmed that the semantic manifest successfully parsed, and mf valiate-configs succeeds up to warehouse validation. I would like help testing the actual metric values on sample data. Which warehouse did you use to develop these changes? Snowflake.

Did you update the CHANGELOG?

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

I'm not sure if this change would be considered a patch. Please let me know how to update the version.

Provide an emoji that best describes your current mood

:dancer:
Jstein77 commented 1 year ago

@fivetran-joemarkiewicz would you be able to help review this PR?

fivetran-joemarkiewicz commented 1 year ago

Hi @Jstein77 just wanted to bump my previous question. Once this is addressed we should be able to move forward with taking steps to fold your changes into the next release.

You call out that this new metric config is compatible with dbt-core 1.6.0. However, the dbt version requirements of the package were not adjusted. Currently the requirements are >= 1.3.0, < 2.0.0. Do you feel this range is still appropriate with the new config?

Jstein77 commented 1 year ago

Hi @fivetran-joemarkiewicz sorry for the delay! We were sprinting to release the Semantic Layer beta. Addressing your two excellent questions:

  1. These configs will only work with dbt-core>=1.6. I've updated the version requirements to reflect that.
  2. I updated the readme with links to the quick start guide, as well as our documentation on how to query these metrics.

I ran into one issue when testing that I wanted to get your help with. Currently, MetricFlow requires users to create a model in their dbt project called metricflow_time_spine.sql that we use to construct cumulative metrics. Originally, I included the time spine model in this package so I could test parsing the project and generating a semantic manifest. However, we expect users to already have a metricflow_time_spine model defined in their project if they are using the semantic layer, and we'll hit a parsing error if there are two models called metricflow_time_spine.

My suggested approach was to remove the metricflow_time_spine.sql from this package, and have users configure it in their own project. However this will cause a dbt parse of this project to fail with the following error:

Parsing Error
  The semantic layer requires a 'metricflow_time_spine' model in the project, but none was found. Guidance on creating this model can be found on our docs site (https://docs.getdbt.com/docs/build/metricflow-time-spine) 

I'm not sure if that impacts your tests, or if this project needs to be able to parse successfully. How would you recommend we proceed?

fivetran-joemarkiewicz commented 1 year ago

Hi @Jstein77 thanks for making the README updates as well as the clarification around the dbt-core version requirements.

In regards to your question about the metricflow_time_spine.sql model and the challenges present with both the inclusion and exclusion from the package. I really do not like the idea of users experiencing a parsing error when using the package. Based on that, I would request we include this model to avoid this issue.

Further, to avoid the issue of users possibly having their own model configured I would recommend we add a config block at the top of the model that is tied to a new variable (possibly ad_reporting__using_metric_flow). This can be enabled by default, but we can add details in the README section to show users how to disable this model if they already have their own in their root project. This could then be highlighted in the CHANGELOG so users who upgrade to this next breaking version will be able to see this and ideally disable it right off the bat if they needed.

What are your thoughts to the above?

fivetran-joemarkiewicz commented 1 year ago

@Jstein77 additionally when reviewing I had an additional few questions I was hoping you could help me understand:

Jstein77 commented 1 year ago

@fivetran-joemarkiewicz I like that approach! I think most users already have a metricflow_time_spine model configured in their project, but they can then set the env variable to disabled. Note that we're planning to create this model automatically for the user so eventually this will no longer be an issue. I'll get a commit pushed with these changes.

What is the difference between the metrics yaml file and the semantic_models yaml file? My understanding is that metrics are defined in the metrics.yaml file and the semantic_models.yaml file then is able to reference these metrics. Let me know if I am misunderstanding the differences between the two.

That's correct, semantic models encode information about measures, dimensions and entites from your dbt models in a way that metricflow understands, and you build metrics from measures defined in semantic models.

I noticed you used yaml as opposed to yml. What is the benefit of using this file type. Or is there any difference? .yaml is just a vestige of how we defined configs at Transform. I think we're recommending folks use the .yml extensions so i will update. Functionally there is no difference.

(edit adding another question) I noticed that in the dbt docs generate command, there is a new semantic_manifest.json file. Would you be able to explain this file and if it is needed when hosting the dbt docs?

This isn't needed to host the docs site. We have docs on the semantic manifest here https://docs.getdbt.com/docs/dbt-cloud-apis/sl-manifest i can also add this to the readme.

fivetran-joemarkiewicz commented 1 year ago

This all sounds great, thanks! One final question, would it be possible to add the metricflow_time_spine model into either the metrics or semantic_models folder? Just to avoid any confusion with the direct output models of the ad reporting package and those related to the metric flow features?

Jstein77 commented 1 year ago

@fivetran-joemarkiewicz updated!

@fivetran-joemarkiewicz One question, I'm using the dbt_date package to generate the time spine model. I noticed it's already included in the package, but i was getting an error when running dbt run --models metricflow_time_spine.

Screenshot 2023-08-08 at 2 58 33 PM

I added the "dbt_date:time_zone": "America/Los_Angeles" to the project vars, but thought it was a bit strange that we don't already have this set. Is there something i'm missing, or do we just need to pass in the timezone var to use this package?

Testing setting metricflow_time_spine: False

Screenshot 2023-08-08 at 2 45 46 PM
fivetran-joemarkiewicz commented 1 year ago

Fantastic, thanks so much @Jstein77! Your changes so far look great, there are a few minor updates I would like to apply (particularly around the version of the package, docs generation, and some documentation updates). Would you be comfortable if I made those changes directly to your branch? If not, I can merge your PR into a staging branch where I will make these changes before merging to main.

Additionally in response to your question:

I added the "dbt_date:time_zone": "America/Los_Angeles" to the project vars, but thought it was a bit strange that we don't already have this set. Is there something i'm missing, or do we just need to pass in the timezone var to use this package?

If you are comfortable with me making updates directly to your branch, I would like to bump you for a review from a Semantic layer point of view before we merge this into our next update. Would that work for you?

Jstein77 commented 1 year ago

@fivetran-joemarkiewicz feel free to make changes directly to the branch!

Jstein77 commented 1 year ago

@fivetran-joemarkiewicz these changes look good to me! I'm good to merge this one on my end.