I would expect the model to complete and just load no data.
Steps To Reproduce
You will need:
an incremental model, partitioned, with more than 100 partitions, force_batch=True, strategy=append
conditions that result in zero data being loaded
e.g. in our case, our model has a clause like
select ...
from ...
where ...
{% if is_incremental() %}
-- don't load data that's already been loaded
and timestamp > (select max(timestamp) from {{this}})
-- don't load data from hours that have not completed, allowing for 30 minutes of lateness
and timestamp < (date_trunc('hour', now() - interval '30' minute))
{% endif %}
So basically, once the model loaded the data for the last completed hour, it will never load data again until at least 30 minutes passed the next hour.
Pretty edge-casey, and we probably could change to insert_overwrite with some effort, but at the same time I don't think the adapter should error out here because there are genuine cases where there might be zero data to load. For example if you are running dbt as part of some other workload and you need to retry the whole thing. Basically it should be idempotent is what I'm saying haha.
Environment
- OS: Darwin 22.6 but we observe the same under Debian running on Fargate in AWS.
- Python: 3.9.19
- dbt: 1.8.0rc2
- dbt-athena-community: 1.8.0rc1
Additional Context
TBH I can probably implement the fix myself, just wanted to ask if there's a specific reason we CTAS on the first batch here rather than CREATE TABLE AS SELECT ... WITH NO DATA before the fore loop and then just INSERT all batches?
Is this a new bug in dbt-athena?
Current Behavior
https://github.com/dbt-athena/dbt-athena/blob/34633a7f5679344852d1004991fc814ab385dadb/dbt/include/athena/macros/materializations/models/table/create_table_as.sql#L182
If there are zero batches to process, this results in query error due to TABLE_NOT_FOUND later on, here:
https://github.com/dbt-athena/dbt-athena/blob/34633a7f5679344852d1004991fc814ab385dadb/dbt/include/athena/macros/materializations/models/helpers/get_partition_batches.sql#L11
because the relation does not exist.
Expected Behavior
I would expect the model to complete and just load no data.
Steps To Reproduce
You will need:
e.g. in our case, our model has a clause like
So basically, once the model loaded the data for the last completed hour, it will never load data again until at least 30 minutes passed the next hour.
Pretty edge-casey, and we probably could change to insert_overwrite with some effort, but at the same time I don't think the adapter should error out here because there are genuine cases where there might be zero data to load. For example if you are running
dbt
as part of some other workload and you need to retry the whole thing. Basically it should be idempotent is what I'm saying haha.Environment
Additional Context
TBH I can probably implement the fix myself, just wanted to ask if there's a specific reason we CTAS on the first batch here rather than
CREATE TABLE AS SELECT ... WITH NO DATA
before thefore
loop and then justINSERT
all batches?