dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.78k stars 1.62k forks source link

[Feature] standardize the `dbt.type_timestamp` macro #9729

Closed fivetran-reneeli closed 6 months ago

fivetran-reneeli commented 7 months ago

Is this a new bug in dbt-core?

Current Behavior

Not sure if this is a bug, but I am wondering if there are plans to standardize the dbt.type_timestamp macro. I believe in most warehouses, it defaults to timestamp without timezone, so it would be helpful to confirm that this is the end result across all adapters.

Expected Behavior

Field is a timestamp without time zone

Steps To Reproduce

n/a

Relevant log output

na

Environment

- OS: mac
- Python: 3.12.2
- dbt: 1.7.9

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

Using multiple adapters

dbeatty10 commented 7 months ago

Thanks for reaching out about this @fivetran-reneeli 🧠

What are the use-case(s) that you are trying to solve for?

Is it doing casts like the following or something else?

cast({{ expression }} as {{ dbt.type_timestamp() }})

Short story

Our docs state that type_timestamp:

may or may not match the behavior of TIMESTAMP WITHOUT TIMEZONE from ANSI SQL-92

So even though it is inconvenient that it is not standardized across adapters, this is known and documented behavior rather than a bug.

I am wondering if there are plans to standardize the dbt.type_timestamp macro

We don't currently have plans to standardize this.

Longer story

Here are some other issues/comments that may be related:

Possibility: a new type_timestamp_ntz() macro

If we were to standardize this, one option would be to introduce a new macro in order to maintain backwards compatibility.

Let's suppose there's a new macro called type_timestamp_ntz. Here's the strings it might return on a per adapter basis to align with naive timestamps that align with TIMESTAMP WITHOUT TIME ZONE from the SQL standard (or java.time.LocalDateTime from Java 8):

Database Naive timestamp data type
Starburst/Trino TIMESTAMP WITHOUT TIME ZONE
Athena TIMESTAMP WITHOUT TIME ZONE
Postgres TIMESTAMP
Redshift TIMESTAMP
AlloyDB TIMESTAMP
Dremio TIMESTAMP
Oracle TIMESTAMP
Snowflake TIMESTAMP_NTZ
Spark SQL TIMESTAMP_NTZ
Databricks TIMESTAMP_NTZ
BigQuery DATETIME
Teradata N/A

Caveats

fivetran-reneeli commented 7 months ago

Thanks for the thorough response @dbeatty10 ! It was helpful to have the related issues-- I found this one to be very in line with extra attention on your comment here.

It would restore the original literal string values returned bytype_timestamp() that existed in dbt-utils previously (e.g., TIMESTAMP_NTZ instead of TIMESTAMP for Snowflake, etc)

For more context, on our side our models leverage dbt.type_timestamp for the following warehouses: Snowflake, Postgres, Databricks, Redshift, and BigQuery.

So like you listed, I believe each one has default TNZ except for BigQuery, where timestamp = with timezone, and datetime = without timezone.

The dbt team may have discussed this in the past, based on comments I found like this. And I dug around the old dbt_utils changelog and saw that the timestamp macro for redshift and postgres were explicitly cast once upon a time.

cc @fivetran-joemarkiewicz

dbeatty10 commented 7 months ago

@fivetran-reneeli All those links you included are golden info 🤩

For more context, on our side our models leverage dbt.type_timestamp for the following warehouses: Snowflake, Postgres, Databricks, Redshift, and BigQuery.

Could you share more about how you are leveraging dbt.type_timestamp? e.g. could you share the end goal that you are trying to accomplish by using it? If you have links to source code, that would be handy.

The reason I ask is there are at least a few different uses I can think of:

fivetran-reneeli commented 7 months ago

Sorry for the delay!

Sure thing, so our team develops data model packages that we make compatible with dbt, that are warehouse-agnostic (so currently the packages work across postgres, snowflake, bigquery, redshift, and databricks). Here is an example model with jira.

cast(created as {{ dbt.type_timestamp() }}) as created_at,

Because we want our models to work with all those mentioned warehouses, ideally the compiled code is the same for all.

dbeatty10 commented 6 months ago

No prob and thanks for that example @fivetran-reneeli !

There's some tricky things to consider here. For example, how would this example behave when created expression in is an aware timestamp vs. a naive timestamp?

We took a look at something similar a year or so ago when considering https://github.com/dbt-labs/dbt-core/issues/5969, and we opted not to tackle it at that time due to the complexity of getting it right and the high probability of accidentally getting it wrong and creating churn for users and maintainers.

See https://github.com/dbt-labs/dbt-core/pull/5935 and https://github.com/dbt-labs/dbt-core/pull/7665 for a small sample of edge cases related to aware vs. naive that we've run into.

So I'm going to upgrade this issue to a Discussion so we can further discuss different design aspirations folks are looking for as well as edge cases people have come across.

I'd propose the main topic of the discussion to be this question: