Closed datalytics1 closed 2 years ago
I tested these changes on my local project and they worked great. Thanks for making these changes!
As this branch has conflicts and has not been updated recently, I created a new PR that has been rebased: https://github.com/Tomme/dbt-athena/pull/73
@courentin Hi, I have a question about your change. If the materialized is "table" and external_location is set, although the table is recreated, doesn't it make an error?
For example, since the external_location is set(static path), although the table is recreated, creating a new table will fail.
what issue did you try to fix with this change?
@Gatsby-Lee that's correct. I added a global static external_location
in dbt_project.yml
like this:
+external_location: "{{ target.s3_staging_dir }}/data/"
Then using this fix, it still throws a HIVE_PATH_ALREADY_EXISTS
error when I run dbt --debug run --threads 1
because the single threaded flow will deploy one model at a time, where the first model works fine, and the second model fails because the S3 directory already contains files from the first model.
The solution, I think, is to make the external_location
unique per model; a folder per model on S3. Then, the cleanup using a static path works fine. Either by setting a unique external_location
on each sql model, e.g.:
{{
config(external_location=target.s3_staging_dir + "/data/my_model_name")
}}
Or waiting for https://github.com/Tomme/dbt-athena/pull/50 to be merged 🤞 Currently, I have an ugly solution where I override the athena__create_table_as
macro in my project to append the model name to the external_location
path.
Nevertheless, merging this PR would help me out a lot to clean up the model folders on S3, and it unblocks PR #50 too.
Any change to merge this @courentin @Tomme ? 🙏
Is this PR superseded by #73?
Same fix is in #73 indeed, but 73 does not have conflicts 👍
@jessedobbelaere Thank you for comment. I built a macro that can generate unique externa_location per model. BTW, even for a model, there could be a error due to the existing path in S3. Therefore, I add random prefix on the S3.
I might miss sth to understand your change. your change is
{% if config.get('incremental_strategy') == 'insert_overwrite' %}
{% if config.get('incremental_strategy') != 'append' %}
doesn't incremental_strategy' != 'append'
equal to incremental_strategy' != 'insert_overwrite'
?
Hi @Gatsby-Lee they're not my changes, but I believe the issue got introduced in https://github.com/Tomme/dbt-athena/pull/43/files
In the past, table data always got cleaned up before dropping, as you see in the diff of #43 . Since the changes in #43 the value of incremental_strategy
is by default set to NONE
when it runs that if-statement, so the cleanup never happens by default anymore. It should only skip the cleanup in case it's an append strategy, so that's what the bugfix is about.
Anyway, #73 introduces the same bugfix as this PR, by someone else, but it's more up to date with master and ready to merge. Hopefully someone can do that 🙏
@jessedobbelaere got it. so, it gives the same functionality, but it breaks the existing implantation.
Thank you
Resolved by #73, thank you for your contribution though!
thank you @jessedobbelaere @Tomme
Minor change to adapters.sql logic to clean_up table when materialization isn´t incremental.