cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.09k stars 3.8k forks source link

ccl/multiregionccl: TestMultiRegionDataDriven failed #115898

Closed cockroach-teamcity closed 10 months ago

cockroach-teamcity commented 10 months ago

ccl/multiregionccl.TestMultiRegionDataDriven failed with artifacts on release-23.1 @ 426f303d85cdfee6477f2a5903f787be90b0c27f:

             0.333ms      0.014ms        event:sql/colflow/vectorized_flow.go:231 [n2,client=127.0.0.1:48778,hostssl,user=root] setting up vectorized flow
             0.364ms      0.031ms        event:sql/colflow/vectorized_flow.go:280 [n2,client=127.0.0.1:48778,hostssl,user=root] vectorized flow setup succeeded
             0.375ms      0.011ms        event:sql/flowinfra/flow.go:466 [n2,client=127.0.0.1:48778,hostssl,user=root] starting (0 processors, 0 startables) asynchronously
             0.383ms      0.007ms        event:sql/colflow/vectorized_flow.go:327 [n2,client=127.0.0.1:48778,hostssl,user=root] running the batch flow coordinator in the flow's goroutine
             0.384ms      0.001ms            === operation:batch flow coordinator _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root cockroach.flowid:ce2a4729-0397-42a6-9e6b-fa2857eb0e11 cockroach.processorid:0
             0.384ms      0.000ms            [colbatchscan: {count: 1, duration 557µs, unfinished}]
             0.384ms      0.000ms            [txn coordinator send: {count: 1, duration 430µs}]
             0.384ms      0.000ms            [dist sender send: {count: 1, duration 397µs}]
             0.384ms      0.000ms            [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.388ms      0.004ms                === operation:colbatchscan _unfinished:1 _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root cockroach.flowid:ce2a4729-0397-42a6-9e6b-fa2857eb0e11 cockroach.processorid:0
             0.388ms      0.000ms                [txn coordinator send: {count: 1, duration 430µs}]
             0.388ms      0.000ms                [dist sender send: {count: 1, duration 397µs}]
             0.388ms      0.000ms                [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.418ms      0.029ms                event:sql/row/kv_batch_fetcher.go:560 [n2,client=127.0.0.1:48778,hostssl,user=root] Scan /Table/108/1/1/0
             0.420ms      0.003ms                    === operation:txn coordinator send _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root txnID:09075f57-29a7-446c-a021-0470e12744af
             0.420ms      0.000ms                    [dist sender send: {count: 1, duration 397µs}]
             0.420ms      0.000ms                    [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.427ms      0.007ms                        === operation:dist sender send _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root txn:09075f57
             0.427ms      0.000ms                        [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.446ms      0.019ms                        event:kv/kvclient/kvcoord/range_iter.go:188 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] querying next range at /Table/108/1/1/0
             0.486ms      0.040ms                        event:kv/kvclient/kvcoord/range_iter.go:225 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] key: /Table/108/1/1/0, desc: r64:/{Table/108-Max} [(n2,s2):1, (n1,s1):9, (n3,s3):8, (n8,s8):6NON_VOTER, (n7,s7):10NON_VOTER, next=11, gen=31]
             0.514ms      0.028ms                        event:kv/kvclient/kvcoord/dist_sender.go:2180 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] r64: sending batch 1 Get to (n2,s2):1
             0.522ms      0.008ms                        event:rpc/nodedialer/nodedialer.go:157 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] sending request to local client
             0.529ms      0.007ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:2 span.kind:server request:Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]
             0.537ms      0.008ms                            event:server/node.go:1187 [n2] node received request: 1 Get
             0.562ms      0.025ms                            event:kv/kvserver/store_send.go:167 [n2,s2] executing Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]
             0.576ms      0.014ms                            event:kv/kvserver/replica_send.go:179 [n2,s2,r64/1:/{Table/108-Max}] read-only path
             0.586ms      0.010ms                            event:kv/kvserver/concurrency/concurrency_manager.go:208 [n2,s2,r64/1:/{Table/108-Max}] sequencing request
             0.591ms      0.005ms                            event:kv/kvserver/concurrency/concurrency_manager.go:289 [n2,s2,r64/1:/{Table/108-Max}] acquiring latches
             0.600ms      0.010ms                            event:kv/kvserver/concurrency/concurrency_manager.go:333 [n2,s2,r64/1:/{Table/108-Max}] scanning lock table for conflicting locks
             0.629ms      0.028ms                            event:kv/kvserver/replica_read.go:296 [n2,s2,r64/1:/{Table/108-Max}] can drop latches early for batch (Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]); scanning lock table first to detect conflicts
             0.658ms      0.030ms                            event:kv/kvserver/replica_read.go:113 [n2,s2,r64/1:/{Table/108-Max}] lock table scan complete without conflicts; dropping latches early
             0.670ms      0.012ms                            event:kv/kvserver/replica_read.go:408 [n2,s2,r64/1:/{Table/108-Max}] executing read-only batch
             0.749ms      0.079ms                            event:kv/kvserver/replica_evaluate.go:550 [n2,s2,r64/1:/{Table/108-Max}] evaluated Get command header:<key:"\364\211\211\210" > , txn="sql txn" meta={id=09075f57 key=/Min pri=0.03601885 epo=0 ts=1702071981.435461927,0 min=1702071981.435461927,0 seq=0} lock=false stat=PENDING rts=1702071981.435461927,0 wto=false gul=1702071981.935461927,0 : resp=header:<num_keys:1 num_bytes:7 > value:<raw_bytes:"\375\311[\010\n#\002" timestamp:<wall_time:1702071979913075711 > > , err=<nil>
             0.781ms      0.032ms                            event:scan stats: stepped 0 times (0 internal); seeked 1 times (1 internal); block-bytes: (total 0 B, cached 0 B); points: (count 1, key-bytes 14 B, value-bytes 14 B, tombstoned: 0) ranges: (count 0), (contained-points 0, skipped-points 0) evaluated requests: 1 gets, 0 scans, 0 reverse scans
             0.789ms      0.008ms                            event:kv/kvserver/replica_read.go:221 [n2,s2,r64/1:/{Table/108-Max}] read completed
             0.755ms     -0.035ms                            structured:{"@type":"type.googleapis.com/cockroach.roachpb.ScanStats","numInterfaceSeeks":"1","numInternalSeeks":"1","numInterfaceSteps":"0","numInternalSteps":"0","blockBytes":"0","blockBytesInCache":"0","keyBytes":"14","valueBytes":"14","pointCount":"1","pointsCoveredByRangeTombstones":"0","rangeKeyCount":"0","rangeKeyContainedPoints":"0","rangeKeySkippedPoints":"0","separatedPointCount":"0","separatedPointValueBytes":"0","separatedPointValueBytesFetched":"0","numGets":"1","numScans":"0","numReverseScans":"0"}
             0.848ms      0.428ms                    event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:576 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] recording span to refresh: /Table/108/1/1/0
             0.928ms      0.544ms            event:ComponentStats{ID: {ce2a4729-0397-42a6-9e6b-fa2857eb0e11 PROCESSOR 0 2 }, KV time: 486µs, KV contention time: 0µs, KV rows read: 1, KV bytes read: 11 B, KV gRPC calls: 1, MVCC step count (ext/int): 0/0, MVCC seek count (ext/int): 1/1, max memory allocated: 20 KiB, sql cpu time: 22µs, batches output: 1, rows output: 1}
             0.938ms      0.010ms            event:ComponentStats{ID: {00000000-0000-0000-0000-000000000000 UNSET 0 ? }}
             0.892ms     -0.046ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"ce2a4729-0397-42a6-9e6b-fa2857eb0e11","type":"PROCESSOR","id":0,"sqlInstanceId":2,"region":""},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"12"},"tuplesRead":{"valuePlusOne":"2"},"batchRequestsIssued":{"valuePlusOne":"2"},"kvTime":{"valuePlusOne":"0.000486466s"},"contentionTime":{"valuePlusOne":"0.000000001s"},"numInterfaceSteps":{"valuePlusOne":"1"},"numInternalSteps":{"valuePlusOne":"1"},"numInterfaceSeeks":{"valuePlusOne":"2"},"numInternalSeeks":{"valuePlusOne":"2"},"blockBytes":{"valuePlusOne":"1"},"blockBytesInCache":{"valuePlusOne":"1"},"keyBytes":{"valuePlusOne":"15"},"valueBytes":{"valuePlusOne":"15"},"pointCount":{"valuePlusOne":"2"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"1"},"rangeKeyCount":{"valuePlusOne":"1"},"rangeKeyContainedPoints":{"valuePlusOne":"1"},"rangeKeySkippedPoints":{"valuePlusOne":"1"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"2"},"numScans":{"valuePlusOne":"1"},"numReverseScans":{"valuePlusOne":"1"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"20481"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"1"},"cpuTime":{"valuePlusOne":"0.000021819s"}},"output":{"numBatches":{"valuePlusOne":"2"},"numTuples":{"valuePlusOne":"2"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
             0.931ms      0.040ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"00000000-0000-0000-0000-000000000000","type":"UNSET","id":0,"sqlInstanceId":0,"region":""},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
             0.970ms      0.587ms        event:ComponentStats{ID: {ce2a4729-0397-42a6-9e6b-fa2857eb0e11 FLOW 0 2 us-east-1}}
             0.962ms     -0.007ms        structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"ce2a4729-0397-42a6-9e6b-fa2857eb0e11","type":"FLOW","id":0,"sqlInstanceId":2,"region":"us-east-1"},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"20481"},"maxDiskUsage":{"valuePlusOne":"1"},"consumedRu":{"valuePlusOne":"0"}}}
             0.995ms      0.680ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] execution ends
             1.003ms      0.008ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] rows affected: 1
             1.076ms      0.074ms        === operation:commit sql txn _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
             1.104ms      0.101ms    event:sql/conn_executor_exec.go:2829 [n2,client=127.0.0.1:48778,hostssl,user=root] AutoCommit. err: <nil>

    --- FAIL: TestMultiRegionDataDriven/regional_by_table (13.46s)
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-34271

rafiss commented 10 months ago
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/8193/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/ccl/multiregionccl/multiregionccl_test_/multiregionccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/multiregionccl/testdata/regional_by_table:30:
         SELECT * FROM db.rbt WHERE k = 1
        expected:
        served locally: false

        found:
        not a match, trace:
             0.000ms      0.000ms    === operation:traced statement _unfinished:1 _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
             0.000ms      0.000ms    [flow: {count: 1, duration 652µs}]
             0.000ms      0.000ms    [batch flow coordinator: {count: 1, duration 559µs}]
             0.000ms      0.000ms    [colbatchscan: {count: 1, duration 557µs, unfinished}]
             0.000ms      0.000ms    [txn coordinator send: {count: 1, duration 430µs}]
             0.000ms      0.000ms    [dist sender send: {count: 1, duration 397µs}]
             0.000ms      0.000ms    [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.000ms      0.000ms    [optimizer: {count: 1, duration 219µs}]
             0.000ms      0.000ms    [commit sql txn: {count: 1, duration 8µs}]
             0.006ms      0.006ms        === operation:optimizer _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
             0.063ms      0.057ms        event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 100(?"defaultdb"?) ver=1:1702072290.712398770,0, refcount=1
             0.079ms      0.016ms        event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [100]
             0.116ms      0.037ms        event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 104(?"db"?) ver=1:1702072295.108623680,0, refcount=1
             0.125ms      0.009ms        event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [104]
             0.133ms      0.008ms        event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [105]
             0.145ms      0.012ms        event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 105(?"public"?) ver=1:1702072278.308230626,0, refcount=1
             0.159ms      0.014ms        event:sql/catalog/lease/descriptor_version_state.go:130 [n2,client=127.0.0.1:48778,hostssl,user=root] descriptorVersionState.incRefCount: 108(?"rbt"?) ver=1:1702072278.076386348,0, refcount=1
             0.168ms      0.009ms        event:sql/catalog/descs/descriptor.go:137 [n2,client=127.0.0.1:48778,hostssl,user=root] looking up descriptors for ids [108]
             0.191ms      0.023ms        event:sql/plan_opt.go:392 [n2,client=127.0.0.1:48778,hostssl,user=root] query cache hit
             0.244ms      0.244ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] planning ends
             0.255ms      0.012ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] checking distributability
             0.266ms      0.010ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] will distribute plan: false
             0.275ms      0.009ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] executing after 0 retries, last retry reason: <nil>
             0.284ms      0.009ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] execution starts: distributed engine
             0.299ms      0.015ms    event:sql/distsql_running.go:1885 [n2,client=127.0.0.1:48778,hostssl,user=root] creating DistSQL plan with isLocal=true
             0.315ms      0.015ms    event:sql/distsql_running.go:813 [n2,client=127.0.0.1:48778,hostssl,user=root] running DistSQL plan
             0.319ms      0.005ms        === operation:flow _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
             0.319ms      0.000ms        [batch flow coordinator: {count: 1, duration 559µs}]
             0.319ms      0.000ms        [colbatchscan: {count: 1, duration 557µs, unfinished}]
             0.319ms      0.000ms        [txn coordinator send: {count: 1, duration 430µs}]
             0.319ms      0.000ms        [dist sender send: {count: 1, duration 397µs}]
             0.319ms      0.000ms        [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.333ms      0.014ms        event:sql/colflow/vectorized_flow.go:231 [n2,client=127.0.0.1:48778,hostssl,user=root] setting up vectorized flow
             0.364ms      0.031ms        event:sql/colflow/vectorized_flow.go:280 [n2,client=127.0.0.1:48778,hostssl,user=root] vectorized flow setup succeeded
             0.375ms      0.011ms        event:sql/flowinfra/flow.go:466 [n2,client=127.0.0.1:48778,hostssl,user=root] starting (0 processors, 0 startables) asynchronously
             0.383ms      0.007ms        event:sql/colflow/vectorized_flow.go:327 [n2,client=127.0.0.1:48778,hostssl,user=root] running the batch flow coordinator in the flow's goroutine
             0.384ms      0.001ms            === operation:batch flow coordinator _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root cockroach.flowid:ce2a4729-0397-42a6-9e6b-fa2857eb0e11 cockroach.processorid:0
             0.384ms      0.000ms            [colbatchscan: {count: 1, duration 557µs, unfinished}]
             0.384ms      0.000ms            [txn coordinator send: {count: 1, duration 430µs}]
             0.384ms      0.000ms            [dist sender send: {count: 1, duration 397µs}]
             0.384ms      0.000ms            [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.388ms      0.004ms                === operation:colbatchscan _unfinished:1 _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root cockroach.flowid:ce2a4729-0397-42a6-9e6b-fa2857eb0e11 cockroach.processorid:0
             0.388ms      0.000ms                [txn coordinator send: {count: 1, duration 430µs}]
             0.388ms      0.000ms                [dist sender send: {count: 1, duration 397µs}]
             0.388ms      0.000ms                [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.418ms      0.029ms                event:sql/row/kv_batch_fetcher.go:560 [n2,client=127.0.0.1:48778,hostssl,user=root] Scan /Table/108/1/1/0
             0.420ms      0.003ms                    === operation:txn coordinator send _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root txnID:09075f57-29a7-446c-a021-0470e12744af
             0.420ms      0.000ms                    [dist sender send: {count: 1, duration 397µs}]
             0.420ms      0.000ms                    [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.427ms      0.007ms                        === operation:dist sender send _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root txn:09075f57
             0.427ms      0.000ms                        [/cockroach.roachpb.Internal/Batch: {count: 1, duration 272µs}]
             0.446ms      0.019ms                        event:kv/kvclient/kvcoord/range_iter.go:188 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] querying next range at /Table/108/1/1/0
             0.486ms      0.040ms                        event:kv/kvclient/kvcoord/range_iter.go:225 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] key: /Table/108/1/1/0, desc: r64:/{Table/108-Max} [(n2,s2):1, (n1,s1):9, (n3,s3):8, (n8,s8):6NON_VOTER, (n7,s7):10NON_VOTER, next=11, gen=31]
             0.514ms      0.028ms                        event:kv/kvclient/kvcoord/dist_sender.go:2180 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] r64: sending batch 1 Get to (n2,s2):1
             0.522ms      0.008ms                        event:rpc/nodedialer/nodedialer.go:157 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] sending request to local client
             0.529ms      0.007ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:2 span.kind:server request:Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]
             0.537ms      0.008ms                            event:server/node.go:1187 [n2] node received request: 1 Get
             0.562ms      0.025ms                            event:kv/kvserver/store_send.go:167 [n2,s2] executing Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]
             0.576ms      0.014ms                            event:kv/kvserver/replica_send.go:179 [n2,s2,r64/1:/{Table/108-Max}] read-only path
             0.586ms      0.010ms                            event:kv/kvserver/concurrency/concurrency_manager.go:208 [n2,s2,r64/1:/{Table/108-Max}] sequencing request
             0.591ms      0.005ms                            event:kv/kvserver/concurrency/concurrency_manager.go:289 [n2,s2,r64/1:/{Table/108-Max}] acquiring latches
             0.600ms      0.010ms                            event:kv/kvserver/concurrency/concurrency_manager.go:333 [n2,s2,r64/1:/{Table/108-Max}] scanning lock table for conflicting locks
             0.629ms      0.028ms                            event:kv/kvserver/replica_read.go:296 [n2,s2,r64/1:/{Table/108-Max}] can drop latches early for batch (Get [/Table/108/1/1/0,/Min), [txn: 09075f57], [can-forward-ts]); scanning lock table first to detect conflicts
             0.658ms      0.030ms                            event:kv/kvserver/replica_read.go:113 [n2,s2,r64/1:/{Table/108-Max}] lock table scan complete without conflicts; dropping latches early
             0.670ms      0.012ms                            event:kv/kvserver/replica_read.go:408 [n2,s2,r64/1:/{Table/108-Max}] executing read-only batch
             0.749ms      0.079ms                            event:kv/kvserver/replica_evaluate.go:550 [n2,s2,r64/1:/{Table/108-Max}] evaluated Get command header:<key:"\364\211\211\210" > , txn="sql txn" meta={id=09075f57 key=/Min pri=0.03601885 epo=0 ts=1702071981.435461927,0 min=1702071981.435461927,0 seq=0} lock=false stat=PENDING rts=1702071981.435461927,0 wto=false gul=1702071981.935461927,0 : resp=header:<num_keys:1 num_bytes:7 > value:<raw_bytes:"\375\311[\010\n#\002" timestamp:<wall_time:1702071979913075711 > > , err=<nil>
             0.781ms      0.032ms                            event:scan stats: stepped 0 times (0 internal); seeked 1 times (1 internal); block-bytes: (total 0 B, cached 0 B); points: (count 1, key-bytes 14 B, value-bytes 14 B, tombstoned: 0) ranges: (count 0), (contained-points 0, skipped-points 0) evaluated requests: 1 gets, 0 scans, 0 reverse scans
             0.789ms      0.008ms                            event:kv/kvserver/replica_read.go:221 [n2,s2,r64/1:/{Table/108-Max}] read completed
             0.755ms     -0.035ms                            structured:{"@type":"type.googleapis.com/cockroach.roachpb.ScanStats","numInterfaceSeeks":"1","numInternalSeeks":"1","numInterfaceSteps":"0","numInternalSteps":"0","blockBytes":"0","blockBytesInCache":"0","keyBytes":"14","valueBytes":"14","pointCount":"1","pointsCoveredByRangeTombstones":"0","rangeKeyCount":"0","rangeKeyContainedPoints":"0","rangeKeySkippedPoints":"0","separatedPointCount":"0","separatedPointValueBytes":"0","separatedPointValueBytesFetched":"0","numGets":"1","numScans":"0","numReverseScans":"0"}
             0.848ms      0.428ms                    event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:576 [n2,client=127.0.0.1:48778,hostssl,user=root,txn=09075f57] recording span to refresh: /Table/108/1/1/0
             0.928ms      0.544ms            event:ComponentStats{ID: {ce2a4729-0397-42a6-9e6b-fa2857eb0e11 PROCESSOR 0 2 }, KV time: 486µs, KV contention time: 0µs, KV rows read: 1, KV bytes read: 11 B, KV gRPC calls: 1, MVCC step count (ext/int): 0/0, MVCC seek count (ext/int): 1/1, max memory allocated: 20 KiB, sql cpu time: 22µs, batches output: 1, rows output: 1}
             0.938ms      0.010ms            event:ComponentStats{ID: {00000000-0000-0000-0000-000000000000 UNSET 0 ? }}
             0.892ms     -0.046ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"ce2a4729-0397-42a6-9e6b-fa2857eb0e11","type":"PROCESSOR","id":0,"sqlInstanceId":2,"region":""},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"12"},"tuplesRead":{"valuePlusOne":"2"},"batchRequestsIssued":{"valuePlusOne":"2"},"kvTime":{"valuePlusOne":"0.000486466s"},"contentionTime":{"valuePlusOne":"0.000000001s"},"numInterfaceSteps":{"valuePlusOne":"1"},"numInternalSteps":{"valuePlusOne":"1"},"numInterfaceSeeks":{"valuePlusOne":"2"},"numInternalSeeks":{"valuePlusOne":"2"},"blockBytes":{"valuePlusOne":"1"},"blockBytesInCache":{"valuePlusOne":"1"},"keyBytes":{"valuePlusOne":"15"},"valueBytes":{"valuePlusOne":"15"},"pointCount":{"valuePlusOne":"2"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"1"},"rangeKeyCount":{"valuePlusOne":"1"},"rangeKeyContainedPoints":{"valuePlusOne":"1"},"rangeKeySkippedPoints":{"valuePlusOne":"1"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"2"},"numScans":{"valuePlusOne":"1"},"numReverseScans":{"valuePlusOne":"1"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"20481"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"1"},"cpuTime":{"valuePlusOne":"0.000021819s"}},"output":{"numBatches":{"valuePlusOne":"2"},"numTuples":{"valuePlusOne":"2"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
             0.931ms      0.040ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"00000000-0000-0000-0000-000000000000","type":"UNSET","id":0,"sqlInstanceId":0,"region":""},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
             0.970ms      0.587ms        event:ComponentStats{ID: {ce2a4729-0397-42a6-9e6b-fa2857eb0e11 FLOW 0 2 us-east-1}}
             0.962ms     -0.007ms        structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"ce2a4729-0397-42a6-9e6b-fa2857eb0e11","type":"FLOW","id":0,"sqlInstanceId":2,"region":"us-east-1"},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"},"usedStreamer":false},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"20481"},"maxDiskUsage":{"valuePlusOne":"1"},"consumedRu":{"valuePlusOne":"0"}}}
             0.995ms      0.680ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] execution ends
             1.003ms      0.008ms    event:sql/conn_executor_exec.go:1036 [n2,client=127.0.0.1:48778,hostssl,user=root] rows affected: 1
             1.076ms      0.074ms        === operation:commit sql txn _verbose:1 node:2 client:127.0.0.1:48778 hostssl: user:root
             1.104ms      0.101ms    event:sql/conn_executor_exec.go:2829 [n2,client=127.0.0.1:48778,hostssl,user=root] AutoCommit. err: <nil>

I see that the trace does contain sending request to local client, but the test is asserting that this operation is not a local read. Moving to KV since it seems that the lease may have moved incorrectly?

arulajmani commented 10 months ago

From the distribution log, we see that the lease was indeed transferred to n2:

I231208 21:46:21.365723 106649 13@kv/kvserver/replicate_queue.go:2113 ⋮ [T1,n1,replicate,s1,r64/9:‹/{Table/108-Max}›] 7110  transferring lease to s2

However, doing so doesn't violate lease preferences, as both n1 and n2 are in "us-east-1". The test is setting up db.rbt to be a regional by table in "us-east-1", so both n1 and n2 are acceptable leaseholders from KVs perspective. The test is assuming there won't be a lease transfer after the call to wait-for-zone-config-changes -- while a lease transfer is rare (as evidenced by the low flake rate), the lease transfer in itself isn't incorrect.

I'm adding the appropriate labels (A-testing, C-bug) given this is a test induced flake. I'll also remove the blocker labels. I'll leave the issue open, and return it to you (sql foundations) @rafiss, to determine de-flaking priorities.

rafiss commented 10 months ago

Thanks for looking into it @arulajmani. Is there a way to determine why the lease transfer occurred? (I know it's allowed, but are there specific things that would cause it to actually happen?) Are there any testing knobs or settings that would make it less likely or impossible to happen? cc @kvoli since I know Austen has looked at this test a lot too.

kvoli commented 10 months ago

Is there a way to determine why the lease transfer occurred? (I know it's allowed, but are there specific things that would cause it to actually happen?)

Bumping the vmodule allocator=5 would tell us why the transfer occurred.

You can disable transfers here using this testing knob:

https://github.com/cockroachdb/cockroach/blob/62ad175cbc0e9041d993ea1cc911e07d279671f5/pkg/kv/kvserver/allocator/base.go#L100-L100

Or you could bump the lease rebalance threshold high enough that it wouldn't transfer (say 1 million): https://github.com/cockroachdb/cockroach/blob/62ad175cbc0e9041d993ea1cc911e07d279671f5/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go#L49-L49

In either case, it would be good to know what the transfer was for. My guess is its the 1 transfer for the 5 minute min_transfer_interval and is to balance lease counts.