databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.87k stars 751 forks source link

bug: double lock ownership due to concurrent race #16906

Closed zhyass closed 1 hour ago

zhyass commented 13 hours ago

Search before asking

Version

v1.2.662-nightly

What's Wrong?

A test case reports a failure where two queries simultaneously hold locks for the same table. The root cause is that the create_lock_revision process lacks atomicity between the two operations: generating a new revision and inserting that revision into the lock key list. This non-atomic behavior allows two queries to independently acquire revisions and insert them into the list without proper synchronization, leading to both queries believing they own the lock.

How to Reproduce?

https://github.com/databendlabs/databend/actions/runs/11947873485/job/33305193605

03_0016_update_with_lock:                                               [ FAIL ] - result differs with:
--- /runner/_work/databend/databend/tests/suites/0_stateless/03_dml/03_0016_update_with_lock.result 2024-11-21 06:52:54.029128224 +0000
+++ /runner/_work/databend/databend/tests/suites/0_stateless/03_dml/03_0016_update_with_lock.stdout 2024-11-21 06:54:48.250200892 +0000
@@ -1,3 +1,4 @@
 10
 Test table lock for update
-10
+Error: APIError: ResponseError with 4001: conflict resolve context:ModifiedSegmentExistsInLatest(SnapshotChanges { appended_segments: [], replaced_segments: {0: ("129/135/_sg/a6c06dcd069948bf85262397021ad89c_v4.mpk", 4)}, removed_segment_indexes: [], merged_statistics: Statistics { row_count: 10, block_count: 1, perfect_block_count: 0, uncompressed_byte_size: 84, compressed_byte_size: 590, index_size: 669, col_stats: {0: ColumnStatistics { min: Number(1_i32), max: Number(10_i32), null_count: 0, in_memory_size: 42, distinct_of_values: None }, 1: ColumnStatistics { min: Number(2_i32), max: Number(11_i32), null_count: 0, in_memory_size: 42, distinct_of_values: None }}, cluster_stats: None }, removed_statistics: Statistics { row_count: 10, block_count: 1, perfect_block_count: 0, uncompressed_byte_size: 84, compressed_byte_size: 590, index_size: 672, col_stats: {0: ColumnStatistics { min: Number(1_i32), max: Number(10_i32), null_count: 0, in_memory_size: 42, distinct_of_values: None }, 1: ColumnStatistics { min: Number(2_i32), max: Number(11_i32), null_count: 0, in_memory_size: 42, distinct_of_values: None }}, cluster_stats: None } })
+9
edf2cfb4-970b-4f1a-a27c-00cb83f11b9c 2024-11-21T06:54:47.747564Z DEBUG databend_query::locks::lock_holder: [lock_holder.rs:181](http://lock_holder.rs:181/) create table lock success, revision=157
edf2cfb4-970b-4f1a-a27c-00cb83f11b9c 2024-11-21T06:54:47.747580Z DEBUG databend_common_meta_client::grpc_client: [grpc_client.rs:353](http://grpc_client.rs:353/) ClientHandle([0.0.0.0:9191](http://0.0.0.0:9191/)) send request to meta client worker: request: ClientWorkerRequest { request_id: 1054, req: StreamList(Streamed(ListKVReq { prefix: "__fd_table_lock/test_tenant/135/" })) }
edf2cfb4-970b-4f1a-a27c-00cb83f11b9c 2024-11-21T06:54:47.747845Z DEBUG databend_common_meta_client::grpc_client: [grpc_client.rs:1116](http://grpc_client.rs:1116/) MetaGrpcClient([0.0.0.0:9191](http://0.0.0.0:9191/))::kv_read_v1 request: ListKV(ListKVReq { prefix: "__fd_table_lock/test_tenant/135/" })

b213d938-d09d-4535-8b49-5426f5cc22c9 2024-11-21T06:54:47.746641Z DEBUG databend_query::locks::lock_holder: [lock_holder.rs:181](http://lock_holder.rs:181/) create table lock success, revision=158
b213d938-d09d-4535-8b49-5426f5cc22c9 2024-11-21T06:54:47.746668Z DEBUG databend_common_meta_client::grpc_client: [grpc_client.rs:353](http://grpc_client.rs:353/) ClientHandle([0.0.0.0:9191](http://0.0.0.0:9191/)) send request to meta client worker: request: ClientWorkerRequest { request_id: 1051, req: StreamList(Streamed(ListKVReq { prefix: "__fd_table_lock/test_tenant/135/" })) }
b213d938-d09d-4535-8b49-5426f5cc22c9 2024-11-21T06:54:47.746689Z DEBUG databend_common_meta_client::grpc_client: [grpc_client.rs:1116](http://grpc_client.rs:1116/) MetaGrpcClient([0.0.0.0:9191](http://0.0.0.0:9191/))::kv_read_v1 request: ListKV(ListKVReq { prefix: "__fd_table_lock/test_tenant/135/" })

Are you willing to submit PR?