Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

Use unique temporary table name + Check schema change #45

Open tuan-seek opened 2 years ago

tuan-seek commented 2 years ago

Changes in this PR:

Antauri commented 2 years ago

I've also opened a different issue https://github.com/Tomme/dbt-athena/issues/62 This seems to do exactly that.

Who can review and merge it please?

Antauri commented 2 years ago

We'd require this for a performance boost on our queries. Can it be merged?

Antauri commented 2 years ago

I've tested this on my own fork, 12 parallel executions (12 batches in parallel for the same hour, distinct sets of minutes from the hour of data) and I confirm it works. If you're going to run DBT in parallel, on the same model, using different "vars" (like the batch number) then at the initial table creation you'll have 12 CTAS instead of 1 CTAS + 11 ITAS (insert-into-as-select) queries, but that's work-aroundable.

Lovely if we could get this merged in the main trunk. This feature helps the use of parallel queries on Athena and gets us down from 20m/hour to 4m/hour by running distinct sets of batches on the same partition (hourly in our case).

nicor88 commented 1 year ago

@tuan-seek and @Antauri I'm quite interested about this feature, if you are not aware, the community decided to fork Tomme/dbt-athena and have a more community friendly setup to changes, new fork is here: https://github.com/dbt-athena/dbt-athena, available in pip too.

Said so, could you tell me how in possible in your setup to have tmp tables with the same name?