dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[CT-3483] Support Semantic Layer timezone configs #9263

Open QMalcolm opened 11 months ago

QMalcolm commented 11 months ago

Housekeeping

Short description

The Semantic Layer currently uses timezone-agnostic types for all date/time operations, but forcing people to coerce to UTC and then back-convert on render is cumbersome at best and impossible at worst. We have real-world examples of people running into date boundary issues where they have customer data stored in the customer-local timezone, and they want to compute daily customer-specific metrics with the boundaries set for that customer's local day (rather than UTC or whatever).

For situations like that it's natural to store the date/time information in local time with timezone annotations intact, as this makes common query types more natural and also allows for things like audits against local time values.

Currently , the only work-around is to normalize the time stamp in the warehouse (via the underlying dbt model) ahead of time, or else to use the expr field to do it on the fly when the semantic model is constructed. This is somewhat limiting, as it does not allow for re-use of the same measure against different time zones.

To better support timezone information we need to expand the definition of TIME type Dimensions. Specifically a timezone attribute should be added to the Dimension type_params

dimensions:
  - name: date_time
    type: time
    type_params:
      time_granularity: day
      timezone: pst # New

Acceptance criteria

People can specify a timezone property on dimensions.

Impact to Other Teams

Will backports be required?

Not sure if required, but the SL team would prefer a backport to 1.7

Context

This issue should not be considered ready to work on until the work is done in dbt-semantic-interfaces, because the schema we need to implement isn't finalized until that point. Currently there does exist an issue for this work in dbt-semantic-interfaces.

QMalcolm commented 9 months ago

Should this be a location name instead of a timezone abbreviation? @Jstein77

aranke commented 9 months ago

List of timezones natively supported in Python: pytz.all_timezones_set

dbeatty10 commented 9 months ago

I'd advocate for using tz database names (like America/Boise) rather than non-standardized abbreviations (like MST).

All of the following return names from the tz database:

The tz database is also known as: