dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 100 forks source link

Compatibility with Athena Engine v3 #15

Closed Jrmyy closed 1 year ago

Jrmyy commented 2 years ago

👋🏻 Hello dbt-athena squad

For now the adapter uses the version 2 of Athena Engine, according to README. On 2022.10.13, Athena release the V3 of their Engine, reducing the gap between Athena and Trino features.

I don't know what we want to do about this :

  1. Does the adapter only support the engine v2 ?
  2. Since we are responsible of the SQL we are creating and we can configure the workgroup, is the engine version really limited in the adapter ?

FYI there are the breaking changes

jessedobbelaere commented 2 years ago

I also assume that the adapter itself is not tightly coupled to Athena engine v2 specifics. The work_group param allows you to switch between a v2 or v3 workgroup indeed.

Personally, I don't have experience running on athena engine v3 yet, as I experienced some Athena errors such as HIVE_METASTORE_ERROR: Database cannot be a link for this operation when called on a table. when running a create table on lake formation governed tables, or a random java.lang.NullPointerException in Athena. I also saw dbt-athena users having errors or performance issues in the #dbt-athena slack thread. But I'll log AWS support tickets and take it for a spin in a month again and evaluate 👌

nicor88 commented 2 years ago

I agree with you @jessedobbelaere , the adapter shouldn't have any issue on using v3, as it's tight to the workgroup.

Let's refer to this to understand if there is work to do. After some testing I noticed those:

The biggest breaking changes should be in the model level, not in the adapters internal.

Also, we have this repo https://github.com/dbt-athena/dbt-athena-tester to use as reference to run the same set of models when developing. @Jrmyy and @jessedobbelaere feel free to have a look and add relevant models if necessary to test v2 vs v3

nicor88 commented 2 years ago

@Jrmyy I managed to use v3 with the adapter, I time to time need to apply explicit casting to the timestamp. Here few:

I think that to tackle this issue, we could just add a section in the readme on how to solve common cases, to make extra smoother, a sort of enrichment of the athena docs.

Jrmyy commented 2 years ago

Yes, I think we can tackle this using README.md, since now we will support both engine versions but with different features (CTAS & merge strategies for v3, temp parquet table for v2 + some data types stuff).

Jrmyy commented 1 year ago

We finally decided to go only for a support of v3 engine concerning Iceberg tables, (i.e. if you use parquet tables, you can still use the v2 engine). (#64) What drove us to this decision is :

The consequences are, for Iceberg, you will need :

Jrmyy commented 1 year ago

Should we close this since the documentation makes it clearer now what you can and can't do with different athena adapter versions and different table types ?

nicor88 commented 1 year ago

Yes please.