dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
101 stars 59 forks source link

[Spike] Conflict with concurrent transaction #903

Open jennlw opened 2 months ago

jennlw commented 2 months ago

Is this a new bug in dbt-redshift?

Current Behavior

Job runs sporadically hitting "conflict with concurrent transaction error." The failure is inconsistent as the customer cannot reproduce it on every execution. On one of their dbt Cloud environment (5 threads) it happened twice but on our their local DBT core instances(20+ threads) they were able to reproduce it every third execution. Their local execution has another model victim.

They checked their logs and were not able to locate any other thread running parallel execution that would cause conflict.

Expected Behavior

Job run does not fail on concurrent transactions.

Steps To Reproduce

Not yet, but customer mentioned:

"Neither of the models are using incremental materialization, only thing that might be related to this failure is that these models being used in the materialized view downstream."

I had ran this past our team internally and it was mentioned:

I suspect this could be the culprit. I remember that querying the object dependency tree in Redshift is not easy when using MVs. So I wonder if dbt is dropping the MV in parallel to the refresh of this table and that's attempting to drop this table at the same time.

Relevant log output

2024-09-03 00:40:17.026263 (Thread-3 (worker)): 00:40:17 On model.A_dbt.modelA: /* {"app": "dbt", "dbt_version": "1.7.17", "profile_name": "user", "target_name": "default", "node_id": "model.A_dbt.modelA"} */ drop table if exists "abcd"."dbt_1234"."modelA__dbt_backup" cascade

2024-09-03 00:40:22.922586 (Thread-3 (worker)): 00:40:22 Redshift adapter: Redshift error: could not complete because of conflict with concurrent transaction 2024-09-03 00:40:22.923500 (Thread-3 (worker)): 00:40:22 On model.A_dbt.modelA: ROLLBACK 2024-09-03 00:40:23.147961 (Thread-3 (worker)): 00:40:23 Redshift adapter: Error running SQL: macro drop_relation

Environment

- OS:
- Python:
- dbt-core: 1.7.17 (cloud)
- dbt-redshift: 1.7.7

Additional Context

Please feel free to reach out to me and we can further discuss logs. I did not add them as they are customer logs and not dbt sandbox or test logs from dbt.

jennlw commented 2 months ago

I should also note that I removed customer model names, and added fake ones. Please reach out to me if you need more information.

colin-rogers-dbt commented 2 months ago

@jennlw has this behavior been seen in earlier dbt versions as well? When was this behavior first seen?

jennlw commented 2 months ago

Hey Colin, I've reached out and I'm asking if it was seen prior to version 1.7. I'll let you know what I find.

jennlw commented 2 months ago

Hey Colin, the customer stated that the issue they experienced was only on 1.7 as this piece of code had ran only on that version.

VersusFacit commented 2 weeks ago

Spike

My goal was to whittle this down to the simplest way to reproduce this error as often as possible to gather insight from those results.

First, this is a very tricky issue to reproduce. I had much, much more luck on main than 1.7.7 or 1.7.18.

Results

[debug] [Thread-1 (]: Applying DROP to: "ci"."dbt_mila"."stg_customers__dbt_backup"
[debug] [Thread-1 (]: Using redshift connection "model.jaffle_shop.stg_customers"
[debug] [Thread-1 (]: On model.jaffle_shop.stg_customers: /* {"app": "dbt", "dbt_version": "1.9.0b3", "profile_name": "redshift", "target_name": "dev", "node_id": "model.jaffle_shop.stg_customers"} */
drop table if exists "ci"."dbt_mila"."stg_customers__dbt_backup" cascade
[debug] [Thread-3 (]: Redshift adapter: Redshift error: could not complete because of conflict with concurrent transaction
  1. stg_payments table is where the concurrency transaction failure happens most often if not every time
  2. table materialization has this only happen rarely, if not once out of 10+ runs trying to trigger it
  3. view materialization never failed out on me out of 10 runs
  4. materialized_view materializations have this SUPER common, independent of on_configuration_change -- all on main
    • Apply -- 2/5 (only lower by laws of probability imo, but reporting my formal trials as they happened for transparency)
    • Continue -- 3/5
    • Fail -- 4/5
  5. Out of 10 attempts only once did I get the concurrency error on 1.7.18

Other questions I had

Why is the focal point stg_payments, the last of the non materialized view tables?

Why is its drop causing this error?

Why materialized views as the main issue here?