dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[CT-3482] Support a "time spine" aggregation time dimension type #9262

Open QMalcolm opened 11 months ago

QMalcolm commented 11 months ago

Housekeeping

Short description

Currently, all aggregations assume that every input row has a point in time value attached to it, and that all time values of interest are represented in the input data source.

This assumption is extremely limiting. If someone defines a measure on a dimension data source with a validity window set (i.e., an SCD Type II layout), every row - either implicitly or explicitly - is mapped to a range of points in time, rather than a singular point in time, and any measure that does a simple linear aggregation against one or the other endpoint of that range will produce incorrect results.

To expand what is supported by the agg_time_dimension property on measures. Specifically an (optional) agg_time_dimension will need to support the following spec

measures:
    - name: listings
      expr: 1
      agg: sum
      agg_time_dimension:
        dimension: time_spine # New
        start_time: date_added # New
        end_time: date_removed # New

Acceptance criteria

People can specify the expanded agg_time_dimension attributes, and the result complies to the dbt-semantic-interfaces protocol spec.

Impact to Other Teams

Will backports be required?

Not sure if required, but the SL team would prefer a backport to 1.7

Context

This issue should not be considered ready to work on until the work is done in dbt-semantic-interfaces, because the schema we need to implement isn't finalized until that point. Currently there does exist an issue for this work in dbt-semantic-interfaces.

QMalcolm commented 9 months ago

start_time and end_time should be datetime objects on the parsed node. DSI validations will handle checking that the time dimension actually exists. Mashumaro will handle parsing the yaml to the python types.