dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 98 forks source link

Hive vs Iceberg timestamps in unit tests #653

Open valerio-auricchio opened 6 months ago

valerio-auricchio commented 6 months ago

Is this a new bug in dbt-athena?

Current Behavior

I’m spiking dbt-athena but I’ve bumped into another issue. At the moment, I am building all models as Iceberg tables. As a result, all of the timestamp fields are set as timestamp(6). All good so far, and I can dbt run to build all of the models without a problem. However, when I run the unit tests, I am getting timestamp errors: Incorrect timestamp precision for timestamp(6); the configured precision is MILLISECONDS; column name:"column_name.

The issue is that when it creates the intermediate tables for testing, it uses "hive" tables. However, in our case, we need "iceberg" tables.

When we launch the tests, it use this SQL query, and use the table_type='hive':

create table "datacatalog"."schema"."table_dbt_tmp"

    with (

      table_type='**hive**',

      is_external=true,external_location='s3://.....',

      format='parquet'

    )
....

Expected Behavior

We expect to have table_type='iceberg' instead of hive in this way:

create table "datacatalog"."schema"."table_dbt_tmp"

    with (

      table_type='**iceberg**',

      is_external=true,external_location='s3://.....',

      format='parquet'

    )
....

Steps To Reproduce

Here are the steps to reproduce our issue. We are using Athena and materializing the tables in Iceberg, try to run an unit test in a model that is materialiezed iceberg and has column_type "timestamp(6)".

Environment

- dbt: 1.8.0
- dbt-athena-community: 1.8.1

Additional Context

No response

Jrmyy commented 6 months ago

👋🏻 Hello

Thanks for opening the issue. Following a discussion on Slack this morning(https://getdbt.slack.com/archives/C013MLFR7BQ/p1716193690918929?thread_ts=1715843843.181679&cid=C013MLFR7BQ), I think the issue is more precise than that. It happens only when storing failure tests. Because when this option is set up (store_failures set to true), it creates a table but we don't get to configure the table type.

One idea would be to add an additional parameter store_failures_table_config as a dictionary, override the dbt initial macro so that we can support this.

nicor88 commented 6 months ago

@Jrmyy is there a macro called by dbt-core to overwrite? in that case we can overwrite that instead.

Jrmyy commented 6 months ago

@nicor88 there is no macro unfortunately, it is directly defined in test materialization: https://github.com/dbt-labs/dbt-adapters/blob/main/dbt/include/global_project/macros/materializations/tests/test.sql#L5

nicor88 commented 6 months ago

@Jrmyy what you suggested works for me, we can let the user pass a store_failures_table_config in order to persist what the need. Also I was wondering if we could have overwrite small macros pieces, instead of the all thing. But if we have no choice we can overwrite the all test macro, and eventually "raise" an issue in dbt-adapters/dbt-core

borjagonzal commented 3 months ago

Hi, how could we help to make progress with this issue. It would be great to to be able to use unit tests directly for our models producing Iceberg tables.

nicor88 commented 2 months ago

@colin-rogers-dbt do you have some hints regarding this issue? What will be the best way to configure table properties of the table that is being used when storing failures? what proposed by @Jrmyy here should work, but I would like to understand if there is an easier way.