Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

Clean up tables when materialization isn´t incremental #49

Closed datalytics1 closed 2 years ago

datalytics1 commented 2 years ago

Minor change to adapters.sql logic to clean_up table when materialization isn´t incremental.

owenprough-sift commented 2 years ago

I tested these changes on my local project and they worked great. Thanks for making these changes!

nialloriordan commented 2 years ago

As this branch has conflicts and has not been updated recently, I created a new PR that has been rebased: https://github.com/Tomme/dbt-athena/pull/73

Gatsby-Lee commented 2 years ago

@courentin Hi, I have a question about your change. If the materialized is "table" and external_location is set, although the table is recreated, doesn't it make an error?

For example, since the external_location is set(static path), although the table is recreated, creating a new table will fail.

what issue did you try to fix with this change?

jessedobbelaere commented 2 years ago

@Gatsby-Lee that's correct. I added a global static external_location in dbt_project.yml like this:

    +external_location: "{{ target.s3_staging_dir }}/data/"

Then using this fix, it still throws a HIVE_PATH_ALREADY_EXISTS error when I run dbt --debug run --threads 1 because the single threaded flow will deploy one model at a time, where the first model works fine, and the second model fails because the S3 directory already contains files from the first model.

The solution, I think, is to make the external_location unique per model; a folder per model on S3. Then, the cleanup using a static path works fine. Either by setting a unique external_location on each sql model, e.g.:

{{
    config(external_location=target.s3_staging_dir + "/data/my_model_name")
}}

Or waiting for https://github.com/Tomme/dbt-athena/pull/50 to be merged 🤞 Currently, I have an ugly solution where I override the athena__create_table_as macro in my project to append the model name to the external_location path.

Nevertheless, merging this PR would help me out a lot to clean up the model folders on S3, and it unblocks PR #50 too.

Any change to merge this @courentin @Tomme ? 🙏

owenprough-sift commented 2 years ago

Is this PR superseded by #73?

jessedobbelaere commented 2 years ago

Same fix is in #73 indeed, but 73 does not have conflicts 👍

Gatsby-Lee commented 2 years ago

@jessedobbelaere Thank you for comment. I built a macro that can generate unique externa_location per model. BTW, even for a model, there could be a error due to the existing path in S3. Therefore, I add random prefix on the S3.

I might miss sth to understand your change. your change is

  {% if config.get('incremental_strategy') == 'insert_overwrite' %}
  {% if config.get('incremental_strategy') != 'append' %}

doesn't incremental_strategy' != 'append' equal to incremental_strategy' != 'insert_overwrite'?

jessedobbelaere commented 2 years ago

Hi @Gatsby-Lee they're not my changes, but I believe the issue got introduced in https://github.com/Tomme/dbt-athena/pull/43/files

In the past, table data always got cleaned up before dropping, as you see in the diff of #43 . Since the changes in #43 the value of incremental_strategy is by default set to NONE when it runs that if-statement, so the cleanup never happens by default anymore. It should only skip the cleanup in case it's an append strategy, so that's what the bugfix is about.

Anyway, #73 introduces the same bugfix as this PR, by someone else, but it's more up to date with master and ready to merge. Hopefully someone can do that 🙏

Gatsby-Lee commented 2 years ago

@jessedobbelaere got it. so, it gives the same functionality, but it breaks the existing implantation.

Thank you

Tomme commented 2 years ago

Resolved by #73, thank you for your contribution though!

Gatsby-Lee commented 2 years ago

thank you @jessedobbelaere @Tomme