dbt-athena / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
200 stars 89 forks source link

Hive vs Iceberg timestamps in unit tests #653

Open valerio-auricchio opened 1 month ago

valerio-auricchio commented 1 month ago

Is this a new bug in dbt-athena?

Current Behavior

I’m spiking dbt-athena but I’ve bumped into another issue. At the moment, I am building all models as Iceberg tables. As a result, all of the timestamp fields are set as timestamp(6). All good so far, and I can dbt run to build all of the models without a problem. However, when I run the unit tests, I am getting timestamp errors: Incorrect timestamp precision for timestamp(6); the configured precision is MILLISECONDS; column name:"column_name.

The issue is that when it creates the intermediate tables for testing, it uses "hive" tables. However, in our case, we need "iceberg" tables.

When we launch the tests, it use this SQL query, and use the table_type='hive':

create table "datacatalog"."schema"."table_dbt_tmp"

    with (

      table_type='**hive**',

      is_external=true,external_location='s3://.....',

      format='parquet'

    )
....

Expected Behavior

We expect to have table_type='iceberg' instead of hive in this way:

create table "datacatalog"."schema"."table_dbt_tmp"

    with (

      table_type='**iceberg**',

      is_external=true,external_location='s3://.....',

      format='parquet'

    )
....

Steps To Reproduce

Here are the steps to reproduce our issue. We are using Athena and materializing the tables in Iceberg, try to run an unit test in a model that is materialiezed iceberg and has column_type "timestamp(6)".

Environment

- dbt: 1.8.0
- dbt-athena-community: 1.8.1

Additional Context

No response

Jrmyy commented 1 month ago

👋🏻 Hello

Thanks for opening the issue. Following a discussion on Slack this morning(https://getdbt.slack.com/archives/C013MLFR7BQ/p1716193690918929?thread_ts=1715843843.181679&cid=C013MLFR7BQ), I think the issue is more precise than that. It happens only when storing failure tests. Because when this option is set up (store_failures set to true), it creates a table but we don't get to configure the table type.

One idea would be to add an additional parameter store_failures_table_config as a dictionary, override the dbt initial macro so that we can support this.

nicor88 commented 1 month ago

@Jrmyy is there a macro called by dbt-core to overwrite? in that case we can overwrite that instead.

Jrmyy commented 1 month ago

@nicor88 there is no macro unfortunately, it is directly defined in test materialization: https://github.com/dbt-labs/dbt-adapters/blob/main/dbt/include/global_project/macros/materializations/tests/test.sql#L5

nicor88 commented 1 month ago

@Jrmyy what you suggested works for me, we can let the user pass a store_failures_table_config in order to persist what the need. Also I was wondering if we could have overwrite small macros pieces, instead of the all thing. But if we have no choice we can overwrite the all test macro, and eventually "raise" an issue in dbt-adapters/dbt-core