Currently, all aggregations assume that every input row has a point in time value attached to it, and that all time values of interest are represented in the input data source.
This assumption is extremely limiting. If someone defines a measure on a dimension data source with a validity window set (i.e., an SCD Type II layout), every row - either implicitly or explicitly - is mapped to a range of points in time, rather than a singular point in time, and any measure that does a simple linear aggregation against one or the other endpoint of that range will produce incorrect results.
To expand what is supported by the agg_time_dimension property on measures. Specifically an (optional) agg_time_dimension will need to support the following spec
measures:
- name: listings
expr: 1
agg: sum
agg_time_dimension:
dimension: time_spine # New
start_time: date_added # New
end_time: date_removed # New
Acceptance criteria
People can specify the expanded agg_time_dimension attributes, and the result complies to the dbt-semantic-interfaces protocol spec.
Impact to Other Teams
Semantic Layer - The semantic layer can't begin using this until we support it
Cloud artifacts - This change will change the v12 manifest that goes out with 1.8
Will backports be required?
Not sure if required, but the SL team would prefer a backport to 1.7
Context
This issue should not be considered ready to work on until the work is done in dbt-semantic-interfaces, because the schema we need to implement isn't finalized until that point. Currently there does exist an issue for this work in dbt-semantic-interfaces.
start_time and end_time should be datetime objects on the parsed node. DSI validations will handle checking that the time dimension actually exists. Mashumaro will handle parsing the yaml to the python types.
Housekeeping
Short description
Currently, all aggregations assume that every input row has a point in time value attached to it, and that all time values of interest are represented in the input data source.
This assumption is extremely limiting. If someone defines a measure on a dimension data source with a validity window set (i.e., an SCD Type II layout), every row - either implicitly or explicitly - is mapped to a range of points in time, rather than a singular point in time, and any measure that does a simple linear aggregation against one or the other endpoint of that range will produce incorrect results.
To expand what is supported by the
agg_time_dimension
property onmeasures
. Specifically an (optional)agg_time_dimension
will need to support the following specAcceptance criteria
People can specify the expanded
agg_time_dimension
attributes, and the result complies to the dbt-semantic-interfaces protocol spec.Impact to Other Teams
Will backports be required?
Not sure if required, but the SL team would prefer a backport to 1.7
Context
This issue should not be considered ready to work on until the work is done in dbt-semantic-interfaces, because the schema we need to implement isn't finalized until that point. Currently there does exist an issue for this work in dbt-semantic-interfaces.