Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

[BUGFIX] Resolve partition deleting bug in insert_overwrite mode #139

Closed roslovets closed 1 year ago

roslovets commented 1 year ago

Bug

Currently boto3 is able to accidentally skip deleting of some partitions during incremental build in insert_overwrite mode.

Why it's important

Adapter can skip deleting of some partitions before inserting from rows temporary table. New data will appended for undeleted partitions but not overwrited.

This bug prevents me from using this adapter in my pipelines.

What wrong with code

Logic of receiving partition details is poor. get_partitions() function should take into account NextToken parameter (a continuation token, if this is not the first call to retrieve these partitions).

How I fixed this bug

I added loop to be sure that boto3 will find and delete target partition. I got inspiration from awswrangler library, example of get_partitions usage.

roslovets commented 1 year ago

I tested this patch, it works.

aut0clave commented 1 year ago

Duplicate of #82

nicor88 commented 1 year ago

@roslovets this issue was fixed in the community fork https://github.com/dbt-athena/dbt-athena/pull/2 and released yesterday, available in pypi dbt-athena-community==1.0.3, here

Note that the solution is a bit leaner, as it uses glue pagination with build_full_result() method to avoid not necessary looping (it loops under the odd).

roslovets commented 1 year ago

@nicor88 That's what I need, thanks a lot!