Closed fivetran-reneeli closed 6 months ago
Thanks for reaching out about this @fivetran-reneeli ðŸ§
What are the use-case(s) that you are trying to solve for?
Is it doing casts like the following or something else?
cast({{ expression }} as {{ dbt.type_timestamp() }})
Our docs state that type_timestamp
:
may or may not match the behavior of TIMESTAMP WITHOUT TIMEZONE from ANSI SQL-92
So even though it is inconvenient that it is not standardized across adapters, this is known and documented behavior rather than a bug.
I am wondering if there are plans to standardize the
dbt.type_timestamp
macro
We don't currently have plans to standardize this.
Here are some other issues/comments that may be related:
type_timestamp_ntz()
macroIf we were to standardize this, one option would be to introduce a new macro in order to maintain backwards compatibility.
Let's suppose there's a new macro called type_timestamp_ntz
. Here's the strings it might return on a per adapter basis to align with naive timestamps that align with TIMESTAMP WITHOUT TIME ZONE
from the SQL standard (or java.time.LocalDateTime
from Java 8):
Database | Naive timestamp data type |
---|---|
Starburst/Trino | TIMESTAMP WITHOUT TIME ZONE |
Athena | TIMESTAMP WITHOUT TIME ZONE |
Postgres | TIMESTAMP |
Redshift | TIMESTAMP |
AlloyDB | TIMESTAMP |
Dremio | TIMESTAMP |
Oracle | TIMESTAMP |
Snowflake | TIMESTAMP_NTZ |
Spark SQL | TIMESTAMP_NTZ |
Databricks | TIMESTAMP_NTZ |
BigQuery | DATETIME |
Teradata | N/A |
TIMESTAMP_NTZ
is not available until Spark 3.4 & Databricks Runtime 13 [1, 2]TIMESTAMP
and TIMESTAMP WITH TIME ZONE
data types, but the former is described as "non-ANSI standard. Analytics Database stores a TIMESTAMP value in UTC." [3]Thanks for the thorough response @dbeatty10 ! It was helpful to have the related issues-- I found this one to be very in line with extra attention on your comment here.
It would restore the original literal string values returned bytype_timestamp() that existed in dbt-utils previously (e.g., TIMESTAMP_NTZ instead of TIMESTAMP for Snowflake, etc)
For more context, on our side our models leverage dbt.type_timestamp for the following warehouses: Snowflake, Postgres, Databricks, Redshift, and BigQuery.
So like you listed, I believe each one has default TNZ except for BigQuery, where timestamp
= with timezone, and datetime
= without timezone.
The dbt team may have discussed this in the past, based on comments I found like this. And I dug around the old dbt_utils changelog and saw that the timestamp macro for redshift and postgres were explicitly cast once upon a time.
cc @fivetran-joemarkiewicz
@fivetran-reneeli All those links you included are golden info 🤩
For more context, on our side our models leverage dbt.type_timestamp for the following warehouses: Snowflake, Postgres, Databricks, Redshift, and BigQuery.
Could you share more about how you are leveraging dbt.type_timestamp
? e.g. could you share the end goal that you are trying to accomplish by using it? If you have links to source code, that would be handy.
The reason I ask is there are at least a few different uses I can think of:
Sorry for the delay!
Sure thing, so our team develops data model packages that we make compatible with dbt, that are warehouse-agnostic (so currently the packages work across postgres, snowflake, bigquery, redshift, and databricks). Here is an example model with jira.
cast(created as {{ dbt.type_timestamp() }}) as created_at,
Because we want our models to work with all those mentioned warehouses, ideally the compiled code is the same for all.
No prob and thanks for that example @fivetran-reneeli !
There's some tricky things to consider here. For example, how would this example behave when created
expression in is an aware timestamp vs. a naive timestamp?
We took a look at something similar a year or so ago when considering https://github.com/dbt-labs/dbt-core/issues/5969, and we opted not to tackle it at that time due to the complexity of getting it right and the high probability of accidentally getting it wrong and creating churn for users and maintainers.
See https://github.com/dbt-labs/dbt-core/pull/5935 and https://github.com/dbt-labs/dbt-core/pull/7665 for a small sample of edge cases related to aware vs. naive that we've run into.
So I'm going to upgrade this issue to a Discussion so we can further discuss different design aspirations folks are looking for as well as edge cases people have come across.
I'd propose the main topic of the discussion to be this question:
Is this a new bug in dbt-core?
Current Behavior
Not sure if this is a bug, but I am wondering if there are plans to standardize the
dbt.type_timestamp
macro. I believe in most warehouses, it defaults to timestamp without timezone, so it would be helpful to confirm that this is the end result across all adapters.Expected Behavior
Field is a timestamp without time zone
Steps To Reproduce
n/a
Relevant log output
Environment
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
Using multiple adapters