Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Support for ICEBERG type tables #69

Open ganathan opened 2 years ago

ganathan commented 2 years ago

AWS has iceberg support in Athena currently in preview. Will it be possible to extend dbt-athena to support ICEBERG table format.

ICEBERG supports ACID transactions and can help with building type 2, type 3 dimensions. Since DBT supports snapshots, can dtb-athena use iceberg table type to do snapshots.

Switching to Hive to Iceberg requires change in metastore. Since Athena handles the metastore, can this be leveraged by dbt-athena? Can we add iceberg table type format to dbt-athena?

silvioluiz commented 2 years ago

I think it would a good idea just too allow including TBLPROPERTIES on CTAS:

CREATE TABLE table_name( field_1 string, field_n string) LOCATION 's3://path' TBLPROPERTIES ( 'table_type'='ICEBERG', 'format'='parquet', 'write_target_data_file_size_bytes'='123456789' )

https://github.com/Tomme/dbt-athena/blob/07151fb0c525d822771f6662a7f4c397feaa1b17/dbt/include/athena/macros/materializations/models/table/create_table_as.sql

If I' right, we just need to modify this file to get simplified support (and avoid clean manually or via macro the s3 folder related with the table)

juliansteger-sc commented 1 year ago

CTAS queries do not seem to be supported with Athena and Apache Iceberg as of now: https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html

so I think this is a big blocker for dbt integration.

ggworld commented 1 year ago

ICEBERG is PRODUCTION in Athena now. It is true that CTAS is not supported and also views on it yet CRUD actions are supported and this opens the ability to support merge which can be a GREAT advantage

nicor88 commented 1 year ago

I second @ggworld , even if CTAS are not supported, CRUD operations (INSERT/DELETE/UPDATES) are. Therefore the fact of not having CTAs is not a big blocker. Also, as MERGE was introduced to trino and athena might support that at some point, so definitely worth to evolve dbt-athena adapter to a format like Iceberg, this will open up a full lakehouse serverless experience in AWS 🥳

zsvoboda commented 1 year ago

+1 upvoting this

nicor88 commented 1 year ago

@zsvoboda some work is being done https://github.com/Tomme/dbt-athena/pull/135 as the adapter is pretty poorly maintained, most probably you will see this feature in the new community fork: https://github.com/dbt-athena/dbt-athena

I will port the pr soon, and then we will add incremental materialisation with update/insert (till merge will be available in athena).