dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 100 forks source link

[Bug] __tmp_not_partitioned CTAS inherited model config "external_location" and causing clashing with the model data table #683

Open u-ra-ra-ra opened 4 months ago

u-ra-ra-ra commented 4 months ago

Is this a new bug in dbt-athena?

Current Behavior

I have a model that writes out to more than 100 hive partitions. I recently dropped the table and attempted to recreate via dbt build. I have an external_location in configuration as I was migrated onto dbt and want to avoid rebuild data / move s3 location.

The unexpected behavior is dbt build succeeded but my s3 path does not contain any data.

dbt / adaptor version:

"app": "dbt", "dbt_version": "1.7.17"
Registered adapter: athena=1.7.2

The CTAS generated:

On model.A.XX: -- /* {"app": "dbt", "dbt_version": "1.7.17", "profile_name": "A", "target_name": "prod", "node_id": "model.A.XX"} */

    create table "awsdatacatalog"."{schema}"."XX__tmp_not_partitioned"
    with (
      table_type='hive',
      is_external=true,external_location= {config.external_location}
      format='orc'
    )
    as
      SELECT

Expected Behavior

This XXtmp_not_partitioned should not use the configured external path which is clashing with the actual model. Instead it should either use a unique tmp location or same as dbt_tmp table which goes to s3_data_dir/schema/table__dbt_tmp

Steps To Reproduce

create a partitioned model with more than 100 hive partitions, with an external_location in config.

config:
    enabled: true
    external_location:some_p3_path

If this table exists in glue, drop it.

Run dbt run to create more than 100 partitions in one sql run.

Environment

- OS: ubuntu 22.04.4 jammy
- Python: 3.10
- dbt: 1.7.17
- dbt-athena-community: 1.7.2

Additional Context

I am happy to contribute to get this fixed if that helps speed things up.

https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/table/create_table_as.sql

nicor88 commented 4 months ago

@u-ra-ra-ra feel free to propose a pr and we will be happy to review it.

u-ra-ra-ra commented 4 months ago

I will hopefully have some time next week to give it shot.

lucastrubiano commented 2 months ago

At my work, I experienced the same issue, and I found that one solution is to set temporary to true on the following line in the file:

dbt-athena/dbt/include/athena/macros/materializations/models/table/create_table_as.sql

Line 165: change temporary to true: {%- do run_query(create_table_as(true, tmp_relation, compiled_code, language, true)) -%}

Screenshot 2024-08-26 at 9 05 10 PM

I hope this helps!

nicor88 commented 2 months ago

@lucastrubiano thanks for spotting this issue. Do you mind to raise a PR with your fix? Thanks

lucastrubiano commented 2 months ago

@nicor88 Glad to push the changes, but I don’t have permissions to create a branch or PR. Any guidance on how to proceed?

nicor88 commented 2 months ago

@lucastrubiano I recommend to have a look at the contributing section. Long story short, if you are not a project maintainer you cannot push directly to dbt-athena/dbt-athena. You have to fork the repository, and then propose a PR from your fork, pretty much here you can find a detailed guide on how to do it.