dbt-labs / dbt-semantic-interfaces

The shared semantic layer definitions that dbt-core and MetricFlow use.
Apache License 2.0
66 stars 13 forks source link

[Feature] Add `timezone` key to time dimensions `type_params` #253

Open Jstein77 opened 7 months ago

Jstein77 commented 7 months ago

Is this your first time submitting a feature request?

Describe the feature

MetricFlow currently uses timezone-agnostic types for all date/time operations, but forcing people to coerce to UTC and then back-convert on render is cumbersome at best and impossible at worst. We have real-world examples of people running into date boundary issues where they have customer data stored in the customer-local timezone, and they want to compute daily customer-specific metrics with the boundaries set for that customer's local day (rather than UTC or whatever).

For situations like that it's natural to store the date/time information in local time with timezone annotations intact, as this makes common query types more natural and also allows for things like audits against local time values.

Currently , the only work-around is to normalize the time stamp in the warehouse (via the underlying dbt model) ahead of time, or else to use the expr field to do it on the fly when the semantic model is constructed. This is somewhat limiting, as it does not allow for re-use of the same measure against different time zones.

This issue will only become more common when we add hourly granularities, so it's something we'll need to address.

Note - the existing use case as described here requires us to aggregate against local time - if we do so against UTC the daily aggregations will be offset against user local time - so any strategy involving doing everything in UTC proper and then offsetting the result set in its entirety won't work, as different rows will need to be offset independently. Whether we provide support for this kind of approach via MetricFlow or some other mechanism involving dbt date/time plugins that allow MetricFlow to do simpler aggregations is an open question. https://github.com/dbt-labs/metricflow/issues/733

The config changes need to keep the timezone annotation intact are relatively simple:

dimensions:
    - name: date_time
        type: time
        type_params:
            time_granularity: day
            timezone: pst # New

Describe alternatives you've considered

Manually convert timezones in dbt or via the expr parameter.

Who will this benefit?

Users who store time dimensions in local timezone.

Are you interested in contributing this feature?

No response

Anything else?

No response

dbeatty10 commented 6 months ago

As noted in https://github.com/dbt-labs/dbt-core/issues/9263#issuecomment-1954807668:

I'd advocate for using tz database names (like America/Boise) rather than non-standardized abbreviations (like MST).

The tz database is also known as:

One way to get lists of the valid time zones is to use the pytz Python library: