dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.55k stars 1.58k forks source link

[CT-3204] [implementation] Automate creation of metricflow_time_spine if the project defines semantic objects #8825

Open graciegoheen opened 10 months ago

graciegoheen commented 10 months ago

Housekeeping

Short description

From https://github.com/dbt-labs/dbt-core/issues/8319

Currently, if the user defines semantic objects in their project, but not a model named metricflow_time_spine, we raise an error.

We should simply create the model automatically, if it is not found in the project, using the recommended definition.

Users should still have the ability to create themselves with a custom implementation if they so choose.

Acceptance criteria

Impact to Other Teams

semantic layer

Will backports be required?

no

Context

So there are two main concerns I believe:

Squaring partial parsing with generating a metricflow_time_spine model when one isn't specified Auto-generating the metricflow_time_spine correctly given the any adapter For Issue (1) there are four possible states

a. metricflow_time_spine was specified by the user and that's still the case b. metricflow_time_spine was specified by the user and now isn't (and thus should be generated) c. metricflow_time_spine wasn't specified by the user (and thus generated) and that's still the case d. metricflow_time_spine wasn't specified by the user (and thus generated) but now it is specified by the user

I think the solution is at the end of parsing if there are semantic layer nodes and no metricflow_time_spine model we add one and mark it as auto generated. At the start of parsing if there is a saved manifest, we drop the metricflow_time_spine node if is marked as having been auto generated. This workflow makes the following happen in the corresponding cases.

a. the metricflow_time_spine is handled by the user specification b. the user specifed metricflow_time_spine gets dropped during partial parsing, and then re-added via the auto-generation c. the auto generated metricflow_time_spine gets dropped, and then re-added at the end d. the auto generated metricflow_time_spine gets dropped, and then the user specified metricflow_time_spine gets added

For issue (2) I don't think we need a cross-database macro for date types, though it would be nice. Instead we could just use the same jinja template we use for the date_spine macro tests, were we do different calls to the macro based on the target data warehouse.

graciegoheen commented 10 months ago

@graciegoheen this idea makes sense to me to support a built-in metricflow_time_spine 🚀

What do you think about calling it to_date()?

Instead of naming it cast_text_to_date, I'd suggest we call it to_date instead. Even though to_date isn't within the SQL standard, databricks, postgres, redshift, and snowflake all have a to_date function that does what we want. Although bigquery is an outlier, nothing we can't solve with a little dispatch magic ✨

Prototype of to_date()

Assuming to_date() is a cross-database macro that takes an ISO 8601 (YYYY-MM-DD) date string as input, here's a completely untested prototype for dbt-postgres:

{% macro to_date(date_str) %}
  {{ return(adapter.dispatch('to_date', 'dbt') (date_str)) }}
{% endmacro %}

{% macro default__to_date(date_str) -%}
    to_date({{ dbt.string_literal(date_str) }})
{%- endmacro %}

Pulling it all together for metricflow_time_spine

The cross-database Jinja template might look like this:

select cast(date_day as date) as date_day
from ({{ dbt.date_spine("day", dbt.to_date("2023-09-01"), dbt.to_date("2023-09-10")) }})
## Appendix ### Validation and error checking If we want, we could always add some format validation to the default implementation of `to_date()` by using the [`datetime`](https://docs.getdbt.com/reference/dbt-jinja-functions/modules#datetime) module: ```sql {%- set dt = modules.datetime.datetime.strptime(date_str, "%Y-%m-%d") -%} ... ``` ### `type_date` macro We may (or may not) want to also create a cross-database `type_date` macro (which [doesn't exist](https://github.com/dbt-labs/dbt-core/blob/3d27483658f517946506f65969c47d1979688de9/core/dbt/include/global_project/macros/utils/data_types.sql) today). I haven't seen any database that _doesn't_ call this data type `DATE`, so that makes it either easy-peasy or extraneous depending how you look at it.

Originally posted by @dbeatty10 in https://github.com/dbt-labs/dbt-core/issues/8319#issuecomment-1756647128

adamcunnington-mlg commented 2 months ago

@graciegoheen I guess this is low priority but is there an ETA for when this would happen?