Closed cockroach-teamcity closed 10 months ago
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/8193/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/ccl/multiregionccl/multiregionccl_test_/multiregionccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/multiregionccl/testdata/regional_by_table:30:
SELECT * FROM db.rbt WHERE k = 1
expected:
served locally: false
found:
not a match, trace:
0.000ms 0.000ms === operation:traced statement _unfinished:1 _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
0.000ms 0.000ms [flow: {count: 1, duration 652µs}]
0.000ms 0.000ms [batch flow coordinator: {count: 1, duration 559µs}]
0.000ms 0.000ms [colbatchscan: {count: 1, duration 557µs, unfinished}]
0.000ms 0.000ms [txn coordinator send: {count: 1, duration 430µs}]
0.000ms 0.000ms [dist sender send: {count: 1, duration 397µs}]
0.000ms 0.000ms [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
0.000ms 0.000ms [optimizer: {count: 1, duration 219µs}]
0.000ms 0.000ms [commit sql txn: {count: 1, duration 8µs}]
0.006ms 0.006ms === operation:optimizer _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
0.063ms 0.057ms event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 100(?"defaultdb"?) ver=1:1702072290.712398770,0, refcount=1
0.079ms 0.016ms event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [100]
0.116ms 0.037ms event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 104(?"db"?) ver=1:1702072295.108623680,0, refcount=1
0.125ms 0.009ms event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [104]
0.133ms 0.008ms event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [105]
0.145ms 0.012ms event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 105(?"public"?) ver=1:1702072278.308230626,0, refcount=1
0.159ms 0.014ms event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 108(?"rbt"?) ver=1:1702072278.076386348,0, refcount=1
0.168ms 0.009ms event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [108]
0.191ms 0.023ms event:sql/plan_opt.go:392 [n2,client=127.0.0.1:48778,hostssl,user=root] query cache hit
0.244ms 0.244ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] planning ends
0.255ms 0.012ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] checking distributability
0.266ms 0.010ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] will distribute plan: false
0.275ms 0.009ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] executing after 0 retries, last retry reason: <nil>
0.284ms 0.009ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] execution starts: distributed engine
0.299ms 0.015ms event:sql/distsql_running.go:1885 [n2,client=127.0.0.1:48778,hostssl,user=root] creating DistSQL plan with isLocal=true
0.315ms 0.015ms event:sql/distsql_running.go:813 [n2,client=127.0.0.1:48778,hostssl,user=root] running DistSQL plan
0.319ms 0.005ms === operation:flow _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
0.319ms 0.000ms [batch flow coordinator: {count: 1, duration 559µs}]
0.319ms 0.000ms [colbatchscan: {count: 1, duration 557µs, unfinished}]
0.319ms 0.000ms [txn coordinator send: {count: 1, duration 430µs}]
0.319ms 0.000ms [dist sender send: {count: 1, duration 397µs}]
0.319ms 0.000ms [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
0.333ms 0.014ms event:sql/colflow/vectorized_flow.go:231 [n2,client=127.0.0.1:48778,hostssl,user=root] setting up vectorized flow
0.364ms 0.031ms event:sql/colflow/vectorized_flow.go:280 [n2,client=127.0.0.1:48778,hostssl,user=root] vectorized flow setup succeeded
0.375ms 0.011ms event:sql/flowinfra/flow.go:466 [n2,client=127.0.0.1:48778,hostssl,user=root] starting (0 processors, 0 startables) asynchronously
0.383ms 0.007ms event:sql/colflow/vectorized_flow.go:327 [n2,client=127.0.0.1:48778,hostssl,user=root] running the batch flow coordinator in the flow's goroutine
0.384ms 0.001ms === operation:batch flow coordinator _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root cockroach.flowid:ce2a4729-0397-42a6-9e6b-fa2857eb0e11 cockroach.processorid:0
0.384ms 0.000ms [colbatchscan: {count: 1, duration 557µs, unfinished}]
0.384ms 0.000ms [txn coordinator send: {count: 1, duration 430µs}]
0.384ms 0.000ms [dist sender send: {count: 1, duration 397µs}]
0.384ms 0.000ms [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
0.388ms 0.004ms === operation:colbatchscan _unfinished:1 _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root cockroach.flowid:ce2a4729-0397-42a6-9e6b-fa2857eb0e11 cockroach.processorid:0
0.388ms 0.000ms [txn coordinator send: {count: 1, duration 430µs}]
0.388ms 0.000ms [dist sender send: {count: 1, duration 397µs}]
0.388ms 0.000ms [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
0.418ms 0.029ms event:sql/row/kv_batch_fetcher.go:560 [n2,client=127.0.0.1:48778,hostssl,user=root] Scan /Table/108/1/1/0
0.420ms 0.003ms === operation:txn coordinator send _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root txnID:09075f57-29a7-446c-a021-0470e12744af
0.420ms 0.000ms [dist sender send: {count: 1, duration 397µs}]
0.420ms 0.000ms [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
0.427ms 0.007ms === operation:dist sender send _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root txn:09075f57
0.427ms 0.000ms [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
0.446ms 0.019ms event:kv/kvclient/kvcoord/range_iter.go:188 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] querying next range at /Table/108/1/1/0
0.486ms 0.040ms event:kv/kvclient/kvcoord/range_iter.go:225 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] key: /Table/108/1/1/0, desc: r64:/{Table/108-Max} [(n2,s2):1, (n1,s1):9, (n3,s3):8, (n8,s8):6NON_VOTER, (n7,s7):10NON_VOTER, next=11, gen=31]
0.514ms 0.028ms event:kv/kvclient/kvcoord/dist_sender.go:2180 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] r64: sending batch 1 Get to (n2,s2):1
0.522ms 0.008ms event:rpc/nodedialer/nodedialer.go:157 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] sending request to local client
0.529ms 0.007ms === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:2 span.kind:server request:Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]
0.537ms 0.008ms event:server/node.go:1187 [n2] node received request: 1 Get
0.562ms 0.025ms event:kv/kvserver/store_send.go:167 [n2,s2] executing Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]
0.576ms 0.014ms event:kv/kvserver/replica_send.go:179 [n2,s2,r64/1:/{Table/108-Max}] read-only path
0.586ms 0.010ms event:kv/kvserver/concurrency/concurrency_manager.go:208 [n2,s2,r64/1:/{Table/108-Max}] sequencing request
0.591ms 0.005ms event:kv/kvserver/concurrency/concurrency_manager.go:289 [n2,s2,r64/1:/{Table/108-Max}] acquiring latches
0.600ms 0.010ms event:kv/kvserver/concurrency/concurrency_manager.go:333 [n2,s2,r64/1:/{Table/108-Max}] scanning lock table for conflicting locks
0.629ms 0.028ms event:kv/kvserver/replica_read.go:296 [n2,s2,r64/1:/{Table/108-Max}] can drop latches early for batch (Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]); scanning lock table first to detect conflicts
0.658ms 0.030ms event:kv/kvserver/replica_read.go:113 [n2,s2,r64/1:/{Table/108-Max}] lock table scan complete without conflicts; dropping latches early
0.670ms 0.012ms event:kv/kvserver/replica_read.go:408 [n2,s2,r64/1:/{Table/108-Max}] executing read-only batch
0.749ms 0.079ms event:kv/kvserver/replica_evaluate.go:550 [n2,s2,r64/1:/{Table/108-Max}] evaluated Get command header:<key:"\364\211\211\210" > , txn="sql txn" meta={id=09075f57 key=/Min pri=0.03601885 epo=0 ts=1702071981.435461927,0 min=1702071981.435461927,0 seq=0} lock=false stat=PENDING rts=1702071981.435461927,0 wto=false gul=1702071981.935461927,0 : resp=header:<num_keys:1 num_bytes:7 > value:<raw_bytes:"\375\311[\010\n#\002" timestamp:<wall_time:1702071979913075711 > > , err=<nil>
0.781ms 0.032ms event:scan stats: stepped 0 times (0 internal); seeked 1 times (1 internal); block-bytes: (total 0 B, cached 0 B); points: (count 1, key-bytes 14 B, value-bytes 14 B, tombstoned: 0) ranges: (count 0), (contained-points 0, skipped-points 0) evaluated requests: 1 gets, 0 scans, 0 reverse scans
0.789ms 0.008ms event:kv/kvserver/replica_read.go:221 [n2,s2,r64/1:/{Table/108-Max}] read completed
0.755ms -0.035ms structured:{"@type":"type.googleapis.com/cockroach.roachpb.ScanStats","numInterfaceSeeks":"1","numInternalSeeks":"1","numInterfaceSteps":"0","numInternalSteps":"0","blockBytes":"0","blockBytesInCache":"0","keyBytes":"14","valueBytes":"14","pointCount":"1","pointsCoveredByRangeTombstones":"0","rangeKeyCount":"0","rangeKeyContainedPoints":"0","rangeKeySkippedPoints":"0","separatedPointCount":"0","separatedPointValueBytes":"0","separatedPointValueBytesFetched":"0","numGets":"1","numScans":"0","numReverseScans":"0"}
0.848ms 0.428ms event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:576 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] recording span to refresh: /Table/108/1/1/0
0.928ms 0.544ms event:ComponentStats{ID: {ce2a4729-0397-42a6-9e6b-fa2857eb0e11 PROCESSOR 0 2 }, KV time: 486µs, KV contention time: 0µs, KV rows read: 1, KV bytes read: 11 B, KV gRPC calls: 1, MVCC step count (ext/int): 0/0, MVCC seek count (ext/int): 1/1, max memory allocated: 20 KiB, sql cpu time: 22µs, batches output: 1, rows output: 1}
0.938ms 0.010ms event:ComponentStats{ID: {00000000-0000-0000-0000-000000000000 UNSET 0 ? }}
0.892ms -0.046ms structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"ce2a4729-0397-42a6-9e6b-fa2857eb0e11","type":"PROCESSOR","id":0,"sqlInstanceId":2,"region":""},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"12"},"tuplesRead":{"valuePlusOne":"2"},"batchRequestsIssued":{"valuePlusOne":"2"},"kvTime":{"valuePlusOne":"0.000486466s"},"contentionTime":{"valuePlusOne":"0.000000001s"},"numInterfaceSteps":{"valuePlusOne":"1"},"numInternalSteps":{"valuePlusOne":"1"},"numInterfaceSeeks":{"valuePlusOne":"2"},"numInternalSeeks":{"valuePlusOne":"2"},"blockBytes":{"valuePlusOne":"1"},"blockBytesInCache":{"valuePlusOne":"1"},"keyBytes":{"valuePlusOne":"15"},"valueBytes":{"valuePlusOne":"15"},"pointCount":{"valuePlusOne":"2"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"1"},"rangeKeyCount":{"valuePlusOne":"1"},"rangeKeyContainedPoints":{"valuePlusOne":"1"},"rangeKeySkippedPoints":{"valuePlusOne":"1"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"2"},"numScans":{"valuePlusOne":"1"},"numReverseScans":{"valuePlusOne":"1"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"20481"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"1"},"cpuTime":{"valuePlusOne":"0.000021819s"}},"output":{"numBatches":{"valuePlusOne":"2"},"numTuples":{"valuePlusOne":"2"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
0.931ms 0.040ms structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"00000000-0000-0000-0000-000000000000","type":"UNSET","id":0,"sqlInstanceId":0,"region":""},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
0.970ms 0.587ms event:ComponentStats{ID: {ce2a4729-0397-42a6-9e6b-fa2857eb0e11 FLOW 0 2 us-east-1}}
0.962ms -0.007ms structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"ce2a4729-0397-42a6-9e6b-fa2857eb0e11","type":"FLOW","id":0,"sqlInstanceId":2,"region":"us-east-1"},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"20481"},"maxDiskUsage":{"valuePlusOne":"1"},"consumedRu":{"valuePlusOne":"0"}}}
0.995ms 0.680ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] execution ends
1.003ms 0.008ms event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] rows affected: 1
1.076ms 0.074ms === operation:commit sql txn _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
1.104ms 0.101ms event:sql/conn_executor_exec.go:2829 [n2,client=127.0.0.1:48778,hostssl,user=root] AutoCommit. err: <nil>
I see that the trace does contain sending request to local client
, but the test is asserting that this operation is not a local read. Moving to KV since it seems that the lease may have moved incorrectly?
From the distribution log, we see that the lease was indeed transferred to n2:
I231208 21:46:21.365723 106649 13@kv/kvserver/replicate_queue.go:2113 ⋮ [T1,n1,replicate,s1,r64/9:‹/{Table/108-Max}›] 7110 transferring lease to s2
However, doing so doesn't violate lease preferences, as both n1 and n2 are in "us-east-1". The test is setting up db.rbt
to be a regional by table in "us-east-1", so both n1 and n2 are acceptable leaseholders from KVs perspective. The test is assuming there won't be a lease transfer after the call to wait-for-zone-config-changes
-- while a lease transfer is rare (as evidenced by the low flake rate), the lease transfer in itself isn't incorrect.
I'm adding the appropriate labels (A-testing
, C-bug
) given this is a test induced flake. I'll also remove the blocker labels. I'll leave the issue open, and return it to you (sql foundations) @rafiss, to determine de-flaking priorities.
Thanks for looking into it @arulajmani. Is there a way to determine why the lease transfer occurred? (I know it's allowed, but are there specific things that would cause it to actually happen?) Are there any testing knobs or settings that would make it less likely or impossible to happen? cc @kvoli since I know Austen has looked at this test a lot too.
Is there a way to determine why the lease transfer occurred? (I know it's allowed, but are there specific things that would cause it to actually happen?)
Bumping the vmodule allocator=5
would tell us why the transfer occurred.
You can disable transfers here using this testing knob:
Or you could bump the lease rebalance threshold high enough that it wouldn't transfer (say 1 million): https://github.com/cockroachdb/cockroach/blob/62ad175cbc0e9041d993ea1cc911e07d279671f5/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go#L49-L49
In either case, it would be good to know what the transfer was for. My guess is its the 1 transfer for the 5 minute min_transfer_interval
and is to balance lease counts.
ccl/multiregionccl.TestMultiRegionDataDriven failed with artifacts on release-23.1 @ 426f303d85cdfee6477f2a5903f787be90b0c27f:
Help
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
/cc @cockroachdb/sql-foundations
This test on roachdash | Improve this report!
Jira issue: CRDB-34271