dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 100 forks source link

[Bug] TABLE_NOT_FOUND {{tmp_relation}} when there are zero batches to process in incremental model #656

Closed antonysouthworth-halter closed 6 months ago

antonysouthworth-halter commented 6 months ago

Is this a new bug in dbt-athena?

Current Behavior

https://github.com/dbt-athena/dbt-athena/blob/34633a7f5679344852d1004991fc814ab385dadb/dbt/include/athena/macros/materializations/models/table/create_table_as.sql#L182

If there are zero batches to process, this results in query error due to TABLE_NOT_FOUND later on, here:

https://github.com/dbt-athena/dbt-athena/blob/34633a7f5679344852d1004991fc814ab385dadb/dbt/include/athena/macros/materializations/models/helpers/get_partition_batches.sql#L11

because the relation does not exist.

Expected Behavior

I would expect the model to complete and just load no data.

Steps To Reproduce

You will need:

e.g. in our case, our model has a clause like

select ...
from ...

where ...
{% if is_incremental() %}
-- don't load data that's already been loaded
and timestamp > (select max(timestamp) from {{this}})

-- don't load data from hours that have not completed, allowing for 30 minutes of lateness
and timestamp < (date_trunc('hour', now() - interval '30' minute))
{% endif %}

So basically, once the model loaded the data for the last completed hour, it will never load data again until at least 30 minutes passed the next hour.

Pretty edge-casey, and we probably could change to insert_overwrite with some effort, but at the same time I don't think the adapter should error out here because there are genuine cases where there might be zero data to load. For example if you are running dbt as part of some other workload and you need to retry the whole thing. Basically it should be idempotent is what I'm saying haha.

Environment

- OS: Darwin 22.6 but we observe the same under Debian running on Fargate in AWS.
- Python: 3.9.19
- dbt: 1.8.0rc2
- dbt-athena-community: 1.8.0rc1

Additional Context

TBH I can probably implement the fix myself, just wanted to ask if there's a specific reason we CTAS on the first batch here rather than CREATE TABLE AS SELECT ... WITH NO DATA before the fore loop and then just INSERT all batches?

antonysouthworth-halter commented 6 months ago

Closing in favour of #519