dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
297 stars 177 forks source link

Safe logic for concurrent executions #1132

Closed rattata2me closed 5 days ago

rattata2me commented 4 months ago

resolves # docs

Problem

when running an incremental table with the on_schema_change policy set to append_new_columns in a dbt project. If two jobs concurrently perform the column check operation, they both generate the same ALTER TABLE statement. Because of this simultaneous execution, a race condition occurs where one job's ALTER TABLE statement succeeds, while the slower executor encounters a SQL compilation error stating that the column already exists. This issue stems from the inability to ensure that the column schema remains unchanged between the column check operation and the execution of the ALTER TABLE statement, leading to potential failures in concurrent environments.

Solution

The solution to this problem is to incorporate the IF NOT EXISTS and IF EXISTS conditions in the ALTER TABLE statements. By using these conditions, the ALTER TABLE statement will only attempt to add a column if it does not already exist, and drop a column only if it exists. However, this introduces some drawbacks. These include increased complexity in SQL logic, potential masking of underlying schema synchronization issues, minor performance impacts, and the risk of partial schema updates. Additionally, this solution is specific to Snowflake or databases that support these conditions, making it less portable to other database systems.

Checklist

rattata2me commented 4 months ago

This is my proposed solution to resolve the concurrency issues stated in https://github.com/dbt-labs/dbt-snowflake/issues/1123 . Right now I can not think of any major drawbacks of fixing the issue by usign the 'IF EXISTS' statements, any feedback on this is appreciated.

colin-rogers-dbt commented 4 months ago

By 'concurrent executions' do you mean performing dbt commands like dbt run against the same target in parallel?

rattata2me commented 4 months ago

Yes precisely, this happens when executing dbt run in two separate processes at the same time against the same target table. More info here https://github.com/dbt-labs/dbt-snowflake/issues/1123

amychen1776 commented 5 days ago

At this time - we do not intend to support concurrent dbt runs and thus will be closing this PR for now.