Closed mafreihaut closed 2 months ago
It's hard to reproduce the issue if you don't post the SQL code used in your transformation.
insert_overwrite works as intended, and it's covered by functional tests that verify that the behvior is correct. I believe that issue that you are facing is due to how you refresh the data incrementally in your model. Therefore please post your full model SQL in order for us to help you to fix the issue properly.
Also, as a possible hint, it looks to me that you might benefit from an iceberg table, and not a classic hive table for what you want to achieve.
I recommend to have a look here https://docs.getdbt.com/docs/build/incremental-models
In your model you need to add this conditional if, for example
{% if is_incremental() %}
where event_date >= current_date - interval '1' day
{% endif %}
doing so in your first run, all the data is processed, but in the 2nd run only "the partitions" produced by your where condition will be overwritten by the model doing so you can achieve what you state as Only affect the intended day and hour partitions in each run
Based on the discussion that happened in Slack, looks like the issue is because of an attempt to modify a table created outside dbt. Seems like that due to a mismatch on the schema (most probably due to the generate_schema_macro), dbt doesn't find the existing table, and therefore it deletes the data in the specified location, to then re-create the table.
I believe that this is not a bug itself but mostly an issue of the user configuration.
Is this a new bug in dbt-athena?
Current Behavior
Text from Slack db-athena:
and
Expected Behavior
Steps To Reproduce
Model Config:
[Thread-1 (]: dbt.adapters.athena.constants adapter: Deleting table data: path='s3://bucket/key/existing-data', bucket='bucket', prefix='key/existing-data/'
Environment
Additional Context
Anders pointed me to creating a bug ticket for this on Slack. I think I covered it pretty well, but please contact me if there is any additional needed information. Thanks!