cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.86k stars 3.77k forks source link

upgrade/upgrades: TestLeasingClusterVersionStarvation failed #123618

Closed cockroach-teamcity closed 2 months ago

cockroach-teamcity commented 4 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ 6e23d32dd0d36181f5c886c9a0eda9c756005bd6:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m44s)

Stack:

goroutine 599699 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240504 07:06:53.710677 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r49/1:/Table/4{6-7}] 1637 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.710849 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r38/1:/Table/3{5-6}] 1638 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.711041 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r47/1:/Table/4{4-5}] 1639 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.711220 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r36/1:/Table/3{3-4}] 1640 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.711400 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r51/1:/Table/{48-57}] 1641 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.711567 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r12/1:/Table/{8-9}] 1642 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.711741 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r60/1:/Table/{58-60}] 1643 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.711921 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r62/1:/Table/6{0-2}] 1644 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.712100 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r42/1:/Table/{39-40}] 1645 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.712272 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r37/1:/Table/3{4-5}] 1646 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.712448 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r46/1:/Table/4{3-4}] 1647 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.712615 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r22/1:/Table/{19-20}] 1648 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.712793 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r43/1:/Table/4{0-1}] 1649 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.712976 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r16/1:/Table/1{3-4}] 1650 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.713147 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r48/1:/Table/4{5-6}] 1651 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.713314 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r35/1:/Table/3{2-3}] 1652 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.713484 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r9/1:/Table/{5-6}] 1653 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.713660 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r14/1:/Table/1{1-2}] 1654 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.713834 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r50/1:/Table/4{7-8}] 1655 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.714039 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r4/1:/System{/tsd-tse}] 1656 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.714214 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r2/1:/System/NodeLiveness{-Max}] 1657 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.714395 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r30/1:/Table/2{7-8}] 1658 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.714578 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r27/1:/Table/2{4-5}] 1659 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.714765 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r41/1:/Table/3{8-9}] 1660 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.714961 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r10/1:/Table/{6-7}] 1661 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.715137 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r26/1:/Table/2{3-4}] 1662 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.715336 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r3/1:/System/{NodeLive…-tsd}] 1663 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.715507 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r20/1:/Table/1{7-8}] 1664 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.715692 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r1/1:/{Min-System/NodeL…}] 1665 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.761697 8923 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1666 gossip status (ok, 1 node) I240504 07:06:53.761697 8923 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1666 +gossip client (0/3 cur/max conns) I240504 07:06:53.761697 8923 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1666 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240504 07:06:53.777302 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r8/1:/Table/{4-5}] 1667 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:53.777579 9162 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r5/1:/{Systemtse-Table/0}] 1668 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240504 07:06:54.028933 8926 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1669 ={"Timestamp":1714806414028930178,"EventType":"runtime_stats","MemRSSBytes":1108086784,"GoroutineCount":304,"MemStackSysBytes":3375104,"GoAllocBytes":697078248,"GoTotalBytes":825780696,"HeapFragmentBytes":53308952,"HeapReservedBytes":45260800,"HeapReleasedBytes":27140096,"CGoAllocBytes":134907200,"CGoTotalBytes":140455936,"CGoCallRate":197.22318,"CPUUserPercent":99.56122,"CPUSysPercent":0.2998832,"GCPausePercent":0.0005706577,"GCRunCount":194,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240504 07:06:54.029120 8926 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1670 runtime stats: 1.0 GiB RSS, 304 goroutines (stacks: 3.2 MiB), 665 MiB/788 MiB Go alloc/total (heap fragmentation: 51 MiB, heap reserved: 43 MiB, heap released: 26 MiB), 129 MiB/134 MiB CGO alloc/total (197.2 CGO/sec), 99.6/0.3 %(u/s)time, 0.0 %gc (194x), 416 B/416 B (r/w)net W240504 07:06:54.805042 10344 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:33781,class=system,rpc] 1671 latency jump (prev avg 79.07ms, current 139.47ms) I240504 07:07:04.052398 8926 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1672 ={"Timestamp":1714806424052394293,"EventType":"runtime_stats","MemRSSBytes":1121226752,"GoroutineCount":303,"MemStackSysBytes":3637248,"GoAllocBytes":656402496,"GoTotalBytes":825780696,"HeapFragmentBytes":48469952,"HeapReservedBytes":90513408,"HeapReleasedBytes":27140096,"CGoAllocBytes":134921536,"CGoTotalBytes":140500992,"CGoCallRate":351.67484,"CPUUserPercent":99.566376,"CPUSysPercent":0.39906362,"GCPausePercent":0.00039587112,"GCRunCount":195,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240504 07:07:04.052579 8926 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1673 runtime stats: 1.0 GiB RSS, 303 goroutines (stacks: 3.5 MiB), 626 MiB/788 MiB Go alloc/total (heap fragmentation: 46 MiB, heap reserved: 86 MiB, heap released: 26 MiB), 129 MiB/134 MiB CGO alloc/total (351.7 CGO/sec), 99.6/0.4 %(u/s)time, 0.0 %gc (195x), 416 B/416 B (r/w)net W240504 07:07:05.641901 10331 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:33781,class=default,rpc] 1674 latency jump (prev avg 79.68ms, current 132.42ms) W240504 07:07:12.195400 10344 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:33781,class=system,rpc] 1675 latency jump (prev avg 77.10ms, current 172.04ms) I240504 07:07:14.073245 8926 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1676 ={"Timestamp":1714806434073241347,"EventType":"runtime_stats","MemRSSBytes":1165410304,"GoroutineCount":305,"MemStackSysBytes":3506176,"GoAllocBytes":656738024,"GoTotalBytes":868772312,"HeapFragmentBytes":52295960,"HeapReservedBytes":129474560,"HeapReleasedBytes":30285824,"CGoAllocBytes":134954304,"CGoTotalBytes":140611584,"CGoCallRate":384.8976,"CPUUserPercent":99.592384,"CPUSysPercent":0.3991679,"GCPausePercent":0.0004777241,"GCRunCount":196} I240504 07:07:14.073431 8926 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1677 runtime stats: 1.1 GiB RSS, 305 goroutines (stacks: 3.3 MiB), 626 MiB/828 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 124 MiB, heap released: 29 MiB), 129 MiB/134 MiB CGO alloc/total (384.9 CGO/sec), 99.6/0.4 %(u/s)time, 0.0 %gc (196x), 0 B/0 B (r/w)net W240504 07:07:23.929911 10344 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:33781,class=system,rpc] 1678 latency jump (prev avg 56.88ms, current 137.00ms) I240504 07:07:24.073303 8926 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1679 ={"Timestamp":1714806444073297386,"EventType":"runtime_stats","MemRSSBytes":1165963264,"GoroutineCount":303,"MemStackSysBytes":3309568,"GoAllocBytes":718878728,"GoTotalBytes":868772312,"HeapFragmentBytes":54626296,"HeapReservedBytes":65200128,"HeapReleasedBytes":30285824,"CGoAllocBytes":134954304,"CGoTotalBytes":140947456,"CGoCallRate":209.69884,"CPUUserPercent":99.59945,"CPUSysPercent":0.19999889,"GCPausePercent":0.0003967978,"GCRunCount":197,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240504 07:07:24.073512 8926 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1680 runtime stats: 1.1 GiB RSS, 303 goroutines (stacks: 3.2 MiB), 686 MiB/828 MiB Go alloc/total (heap fragmentation: 52 MiB, heap reserved: 62 MiB, heap released: 29 MiB), 129 MiB/134 MiB CGO alloc/total (209.7 CGO/sec), 99.6/0.2 %(u/s)time, 0.0 %gc (197x), 416 B/416 B (r/w)net W240504 07:07:32.597908 10331 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:33781,class=default,rpc] 1681 latency jump (prev avg 69.47ms, current 151.48ms) I240504 07:07:34.097537 8926 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1682 ={"Timestamp":1714806454097531003,"EventType":"runtime_stats","MemRSSBytes":1165959168,"GoroutineCount":304,"MemStackSysBytes":3473408,"GoAllocBytes":716619488,"GoTotalBytes":868772312,"HeapFragmentBytes":53698848,"HeapReservedBytes":68222976,"HeapReleasedBytes":30285824,"CGoAllocBytes":134925632,"CGoTotalBytes":140947456,"CGoCallRate":259.8702,"CPUUserPercent":99.75824,"CPUSysPercent":0.19951646,"GCPausePercent":0.00047756263,"GCRunCount":198,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240504 07:07:34.097736 8926 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1683 runtime stats: 1.1 GiB RSS, 304 goroutines (stacks: 3.3 MiB), 683 MiB/828 MiB Go alloc/total (heap fragmentation: 51 MiB, heap reserved: 65 MiB, heap released: 29 MiB), 129 MiB/134 MiB CGO alloc/total (259.9 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (198x), 416 B/416 B (r/w)net W240504 07:07:35.828900 10344 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:33781,class=system,rpc] 1684 latency jump (prev avg 74.81ms, current 123.46ms) ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/release-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-38428

cockroach-teamcity commented 4 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ 2f498d0abcf4130e5c872b73200086d6a3af9f02:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m43s)

Stack:

goroutine 2113744 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240508 04:50:45.044512 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r19/1:/Table/1{6-7}] 1582 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.044673 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r13/1:/Table/{9-11}] 1583 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.044840 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r30/1:/Table/2{7-8}] 1584 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.045006 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r64/1:/{Table/62-Max}] 1585 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.045184 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r14/1:/Table/1{1-2}] 1586 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.045386 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r3/1:/System/{NodeLive…-tsd}] 1587 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.045551 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r27/1:/Table/2{4-5}] 1588 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.045716 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r29/1:/Table/2{6-7}] 1589 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.045886 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r15/1:/Table/1{2-3}] 1590 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.046049 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r37/1:/Table/3{4-5}] 1591 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.046208 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r1/1:/{Min-System/NodeL…}] 1592 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.046381 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r42/1:/Table/{39-40}] 1593 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.046545 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r35/1:/Table/3{2-3}] 1594 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.046707 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r23/1:/Table/2{0-1}] 1595 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.046890 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r11/1:/Table/{7-8}] 1596 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.047053 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r26/1:/Table/2{3-4}] 1597 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.047213 9077 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r24/1:/Table/2{1-2}] 1598 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240508 04:50:45.047308 8831 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1599 gossip status (ok, 1 node) I240508 04:50:45.047308 8831 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1599 +gossip client (0/3 cur/max conns) I240508 04:50:45.047308 8831 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1599 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240508 04:50:45.200760 8834 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1600 ={"Timestamp":1715143845200757553,"EventType":"runtime_stats","MemRSSBytes":1465716736,"GoroutineCount":316,"MemStackSysBytes":4587520,"GoAllocBytes":1024336760,"GoTotalBytes":1191875112,"HeapFragmentBytes":17357960,"HeapReservedBytes":120995840,"HeapReleasedBytes":69926912,"CGoAllocBytes":135652560,"CGoTotalBytes":141352960,"CGoCallRate":48457.94,"CPUUserPercent":75.90226,"CPUSysPercent":24.000713,"GCPausePercent":0.000028160837,"GCRunCount":206,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240508 04:50:45.200943 8834 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1601 runtime stats: 1.4 GiB RSS, 316 goroutines (stacks: 4.4 MiB), 977 MiB/1.1 GiB Go alloc/total (heap fragmentation: 17 MiB, heap reserved: 115 MiB, heap released: 67 MiB), 129 MiB/135 MiB CGO alloc/total (48457.9 CGO/sec), 75.9/24.0 %(u/s)time, 0.0 %gc (206x), 416 B/416 B (r/w)net I240508 04:50:55.201007 8834 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1602 ={"Timestamp":1715143855201003930,"EventType":"runtime_stats","MemRSSBytes":1465737216,"GoroutineCount":318,"MemStackSysBytes":4325376,"GoAllocBytes":1032806424,"GoTotalBytes":1191875112,"HeapFragmentBytes":18784232,"HeapReservedBytes":111362048,"HeapReleasedBytes":69926912,"CGoAllocBytes":135685328,"CGoTotalBytes":141393920,"CGoCallRate":48833.29,"CPUUserPercent":77.19809,"CPUSysPercent":22.699438,"GCPausePercent":0.00039679016,"GCRunCount":207,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240508 04:50:55.201178 8834 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1603 runtime stats: 1.4 GiB RSS, 318 goroutines (stacks: 4.1 MiB), 985 MiB/1.1 GiB Go alloc/total (heap fragmentation: 18 MiB, heap reserved: 106 MiB, heap released: 67 MiB), 129 MiB/135 MiB CGO alloc/total (48833.3 CGO/sec), 77.2/22.7 %(u/s)time, 0.0 %gc (207x), 416 B/416 B (r/w)net I240508 04:51:02.216582 8804 3@pebble/event.go:828 [n1,s1,pebble] 1604 [JOB 60] WAL created 000070 (recycled 000032) I240508 04:51:02.224566 2037962 3@pebble/event.go:780 [n1,s1,pebble] 1605 [JOB 61] flushing 1 memtable (64MB) to L0 I240508 04:51:02.224708 2037962 3@pebble/event.go:808 [n1,s1,pebble] 1606 [JOB 61] flushing: sstable created 000071 I240508 04:51:02.540249 2037962 3@pebble/event.go:808 [n1,s1,pebble] 1607 [JOB 61] flushing: sstable created 000072 I240508 04:51:02.554571 2037962 3@pebble/event.go:784 [n1,s1,pebble] 1608 [JOB 61] flushed 1 memtable (64MB) to L0 [000071 000072] (2.8MB), in 0.3s (0.3s total), output rate 8.4MB/s I240508 04:51:02.554731 2038434 3@pebble/event.go:768 [n1,s1,pebble] 1609 [JOB 62] compacting(default) L0 [000071] (2.2MB) Score=6.62 + L6 [000066] (4.1MB) Score=0.15; OverlappingRatio: Single 1.91, Multi 0.00 I240508 04:51:02.555451 2038434 3@pebble/event.go:808 [n1,s1,pebble] 1610 [JOB 62] compacting: sstable created 000073 W240508 04:51:02.739697 10243 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:44799,class=system,rpc] 1611 latency jump (prev avg 85.37ms, current 219.30ms) I240508 04:51:03.272284 2038434 3@pebble/event.go:808 [n1,s1,pebble] 1612 [JOB 62] compacting: sstable created 000074 I240508 04:51:03.282504 2038434 3@pebble/event.go:772 [n1,s1,pebble] 1613 [JOB 62] compacted(default) L0 [000071] (2.2MB) Score=6.62 + L6 [000066] (4.1MB) Score=0.15 -> L6 [000073 000074] (4.5MB), in 0.7s (0.7s total), output rate 6.2MB/s I240508 04:51:03.282685 2039174 3@pebble/event.go:768 [n1,s1,pebble] 1614 [JOB 64] compacting(default) L0 [000072] (626KB) Score=6.38 + L6 [000067 000068 000069] (5.5MB) Score=0.16; OverlappingRatio: Single 9.08, Multi 0.00 I240508 04:51:03.283048 2039174 3@pebble/event.go:808 [n1,s1,pebble] 1615 [JOB 64] compacting: sstable created 000075 I240508 04:51:03.304251 8703 3@pebble/event.go:812 [n1,s1,pebble] 1616 [JOB 62] sstable deleted 000066 I240508 04:51:03.304360 8703 3@pebble/event.go:812 [n1,s1,pebble] 1617 [JOB 62] sstable deleted 000071 I240508 04:51:03.613970 2039174 3@pebble/event.go:808 [n1,s1,pebble] 1618 [JOB 64] compacting: sstable created 000076 I240508 04:51:03.763670 2039174 3@pebble/event.go:772 [n1,s1,pebble] 1619 [JOB 64] compacted(default) L0 [000072] (626KB) Score=6.38 + L6 [000067 000068 000069] (5.5MB) Score=0.16 -> L6 [000075 000076] (6.0MB), in 0.5s (0.5s total), output rate 13MB/s I240508 04:51:03.769442 8703 3@pebble/event.go:812 [n1,s1,pebble] 1620 [JOB 64] sstable deleted 000067 I240508 04:51:03.769527 8703 3@pebble/event.go:812 [n1,s1,pebble] 1621 [JOB 64] sstable deleted 000068 I240508 04:51:03.769574 8703 3@pebble/event.go:812 [n1,s1,pebble] 1622 [JOB 64] sstable deleted 000069 I240508 04:51:03.769632 8703 3@pebble/event.go:812 [n1,s1,pebble] 1623 [JOB 64] sstable deleted 000072 I240508 04:51:05.201030 8834 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1624 ={"Timestamp":1715143865201027417,"EventType":"runtime_stats","MemRSSBytes":1475989504,"GoroutineCount":304,"MemStackSysBytes":3932160,"GoAllocBytes":1040193408,"GoTotalBytes":1201754664,"HeapFragmentBytes":17705088,"HeapReservedBytes":115326976,"HeapReleasedBytes":60047360,"CGoAllocBytes":135674064,"CGoTotalBytes":141737984,"CGoCallRate":53588.773,"CPUUserPercent":77.39982,"CPUSysPercent":22.499947,"GCPausePercent":0.0006425585,"GCRunCount":208} I240508 04:51:05.201206 8834 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1625 runtime stats: 1.4 GiB RSS, 304 goroutines (stacks: 3.8 MiB), 992 MiB/1.1 GiB Go alloc/total (heap fragmentation: 17 MiB, heap reserved: 110 MiB, heap released: 57 MiB), 129 MiB/135 MiB CGO alloc/total (53588.8 CGO/sec), 77.4/22.5 %(u/s)time, 0.0 %gc (208x), 0 B/0 B (r/w)net I240508 04:51:15.205075 8834 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1626 ={"Timestamp":1715143875205072636,"EventType":"runtime_stats","MemRSSBytes":1475551232,"GoroutineCount":310,"MemStackSysBytes":3670016,"GoAllocBytes":1049523136,"GoTotalBytes":1202278952,"HeapFragmentBytes":18893888,"HeapReservedBytes":105070592,"HeapReleasedBytes":60047360,"CGoAllocBytes":135657680,"CGoTotalBytes":140820480,"CGoCallRate":73837.03,"CPUUserPercent":81.46704,"CPUSysPercent":18.49252,"GCPausePercent":0.0007932791,"GCRunCount":210,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240508 04:51:15.205303 8834 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1627 runtime stats: 1.4 GiB RSS, 310 goroutines (stacks: 3.5 MiB), 1001 MiB/1.1 GiB Go alloc/total (heap fragmentation: 18 MiB, heap reserved: 100 MiB, heap released: 57 MiB), 129 MiB/134 MiB CGO alloc/total (73837.0 CGO/sec), 81.5/18.5 %(u/s)time, 0.0 %gc (210x), 416 B/416 B (r/w)net I240508 04:51:25.205423 8834 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1628 ={"Timestamp":1715143885205420531,"EventType":"runtime_stats","MemRSSBytes":1470541824,"GoroutineCount":317,"MemStackSysBytes":4259840,"GoAllocBytes":1037505896,"GoTotalBytes":1197290024,"HeapFragmentBytes":20974232,"HeapReservedBytes":109428736,"HeapReleasedBytes":65036288,"CGoAllocBytes":135657680,"CGoTotalBytes":140808192,"CGoCallRate":74471.11,"CPUUserPercent":81.49716,"CPUSysPercent":18.399359,"GCPausePercent":0.0003967862,"GCRunCount":211,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240508 04:51:25.205590 8834 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1629 runtime stats: 1.4 GiB RSS, 317 goroutines (stacks: 4.1 MiB), 989 MiB/1.1 GiB Go alloc/total (heap fragmentation: 20 MiB, heap reserved: 104 MiB, heap released: 62 MiB), 129 MiB/134 MiB CGO alloc/total (74471.1 CGO/sec), 81.5/18.4 %(u/s)time, 0.0 %gc (211x), 416 B/416 B (r/w)net ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ 6300c3c3367ad46ac48bf24915cf0d73cae446a0:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m43s)

Stack:

goroutine 819143 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240515 07:36:49.121845 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r32/1:/{Table/29-NamespaceTab…}] 1540 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.122021 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r23/1:/Table/2{0-1}] 1541 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.122201 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r29/1:/Table/2{6-7}] 1542 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.122401 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r51/1:/Table/{48-57}] 1543 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.122574 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r37/1:/Table/3{4-5}] 1544 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.122755 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r30/1:/Table/2{7-8}] 1545 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.122939 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r9/1:/Table/{5-6}] 1546 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.123119 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r27/1:/Table/2{4-5}] 1547 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.123294 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r15/1:/Table/1{2-3}] 1548 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.123466 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r26/1:/Table/2{3-4}] 1549 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.123638 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r25/1:/Table/2{2-3}] 1550 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.123811 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r35/1:/Table/3{2-3}] 1551 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.123980 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r17/1:/Table/1{4-5}] 1552 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.124150 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r50/1:/Table/4{7-8}] 1553 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.124329 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r62/1:/Table/6{0-2}] 1554 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.124499 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r31/1:/Table/2{8-9}] 1555 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.124663 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r8/1:/Table/{4-5}] 1556 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.124834 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r59/1:/Table/5{7-8}] 1557 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.125001 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r38/1:/Table/3{5-6}] 1558 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.125170 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r11/1:/Table/{7-8}] 1559 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.125345 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r20/1:/Table/1{7-8}] 1560 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.125565 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r3/1:/System/{NodeLive…-tsd}] 1561 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.125768 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r10/1:/Table/{6-7}] 1562 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.125943 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r24/1:/Table/2{1-2}] 1563 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.126114 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r39/1:/Table/3{6-7}] 1564 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.126289 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r12/1:/Table/{8-9}] 1565 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.126460 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r42/1:/Table/{39-40}] 1566 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.126629 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r36/1:/Table/3{3-4}] 1567 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.126795 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r49/1:/Table/4{6-7}] 1568 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.126963 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r45/1:/Table/4{2-3}] 1569 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.127142 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r5/1:/{Systemtse-Table/0}] 1570 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.127325 9287 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r18/1:/Table/1{5-6}] 1571 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240515 07:36:49.133374 9033 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1572 gossip status (ok, 1 node) I240515 07:36:49.133374 9033 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1572 +gossip client (0/3 cur/max conns) I240515 07:36:49.133374 9033 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1572 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240515 07:36:49.405962 9036 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1573 ={"Timestamp":1715758609405955311,"EventType":"runtime_stats","MemRSSBytes":1295228928,"GoroutineCount":294,"MemStackSysBytes":3506176,"GoAllocBytes":858023056,"GoTotalBytes":1012498920,"HeapFragmentBytes":57908080,"HeapReservedBytes":61923328,"HeapReleasedBytes":4186112,"CGoAllocBytes":134893024,"CGoTotalBytes":140529664,"CGoCallRate":336.49927,"CPUUserPercent":99.199776,"CPUSysPercent":0.59999865,"GCPausePercent":0.00048383896,"GCRunCount":194,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240515 07:36:49.406171 9036 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1574 runtime stats: 1.2 GiB RSS, 294 goroutines (stacks: 3.3 MiB), 818 MiB/966 MiB Go alloc/total (heap fragmentation: 55 MiB, heap reserved: 59 MiB, heap released: 4.0 MiB), 129 MiB/134 MiB CGO alloc/total (336.5 CGO/sec), 99.2/0.6 %(u/s)time, 0.0 %gc (194x), 416 B/416 B (r/w)net W240515 07:36:54.661010 10444 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:45729,class=system,rpc] 1575 latency jump (prev avg 54.98ms, current 102.28ms) W240515 07:36:58.967674 10431 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:45729,class=default,rpc] 1576 latency jump (prev avg 64.10ms, current 106.01ms) I240515 07:36:59.411718 9036 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1577 ={"Timestamp":1715758619411714114,"EventType":"runtime_stats","MemRSSBytes":1295077376,"GoroutineCount":293,"MemStackSysBytes":3178496,"GoAllocBytes":864489784,"GoTotalBytes":1012498920,"HeapFragmentBytes":53767880,"HeapReservedBytes":59924480,"HeapReleasedBytes":4186112,"CGoAllocBytes":134972384,"CGoTotalBytes":140517376,"CGoCallRate":385.0782,"CPUUserPercent":98.543236,"CPUSysPercent":1.2992516,"GCPausePercent":0.0009824741,"GCRunCount":195,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240515 07:36:59.412031 9036 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1578 runtime stats: 1.2 GiB RSS, 293 goroutines (stacks: 3.0 MiB), 824 MiB/966 MiB Go alloc/total (heap fragmentation: 51 MiB, heap reserved: 57 MiB, heap released: 4.0 MiB), 129 MiB/134 MiB CGO alloc/total (385.1 CGO/sec), 98.5/1.3 %(u/s)time, 0.0 %gc (195x), 416 B/416 B (r/w)net I240515 07:37:09.421975 9036 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1579 ={"Timestamp":1715758629421971535,"EventType":"runtime_stats","MemRSSBytes":1345437696,"GoroutineCount":293,"MemStackSysBytes":3637248,"GoAllocBytes":854658976,"GoTotalBytes":1063022072,"HeapFragmentBytes":55079008,"HeapReservedBytes":118439936,"HeapReleasedBytes":8257536,"CGoAllocBytes":134958048,"CGoTotalBytes":140443648,"CGoCallRate":256.23715,"CPUUserPercent":98.49896,"CPUSysPercent":1.3985653,"GCRunCount":195} I240515 07:37:09.422153 9036 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1580 runtime stats: 1.3 GiB RSS, 293 goroutines (stacks: 3.5 MiB), 815 MiB/1014 MiB Go alloc/total (heap fragmentation: 52 MiB, heap reserved: 113 MiB, heap released: 7.9 MiB), 129 MiB/134 MiB CGO alloc/total (256.2 CGO/sec), 98.5/1.4 %(u/s)time, 0.0 %gc (195x), 0 B/0 B (r/w)net W240515 07:37:10.824830 10444 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:45729,class=system,rpc] 1581 latency jump (prev avg 73.52ms, current 126.71ms) I240515 07:37:19.435517 9036 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1582 ={"Timestamp":1715758639435513874,"EventType":"runtime_stats","MemRSSBytes":1345454080,"GoroutineCount":295,"MemStackSysBytes":3506176,"GoAllocBytes":805735672,"GoTotalBytes":1063022072,"HeapFragmentBytes":54776584,"HeapReservedBytes":167796736,"HeapReleasedBytes":8257536,"CGoAllocBytes":134990816,"CGoTotalBytes":140443648,"CGoCallRate":293.30283,"CPUUserPercent":99.76491,"CPUSysPercent":0.19972955,"GCPausePercent":0.00065191725,"GCRunCount":196,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240515 07:37:19.435693 9036 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1583 runtime stats: 1.3 GiB RSS, 295 goroutines (stacks: 3.3 MiB), 768 MiB/1014 MiB Go alloc/total (heap fragmentation: 52 MiB, heap reserved: 160 MiB, heap released: 7.9 MiB), 129 MiB/134 MiB CGO alloc/total (293.3 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (196x), 416 B/416 B (r/w)net W240515 07:37:25.197700 10444 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:45729,class=system,rpc] 1584 latency jump (prev avg 91.75ms, current 209.23ms) W240515 07:37:27.281706 10431 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:45729,class=default,rpc] 1585 latency jump (prev avg 94.77ms, current 176.68ms) I240515 07:37:29.472295 9036 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1586 ={"Timestamp":1715758649472290778,"EventType":"runtime_stats","MemRSSBytes":1345552384,"GoroutineCount":296,"MemStackSysBytes":3440640,"GoAllocBytes":880036984,"GoTotalBytes":1063022072,"HeapFragmentBytes":57365384,"HeapReservedBytes":90972160,"HeapReleasedBytes":8257536,"CGoAllocBytes":135044576,"CGoTotalBytes":140611584,"CGoCallRate":282.66043,"CPUUserPercent":99.533936,"CPUSysPercent":0.29890072,"GCPausePercent":0.00048206706,"GCRunCount":197,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240515 07:37:29.472482 9036 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1587 runtime stats: 1.3 GiB RSS, 296 goroutines (stacks: 3.3 MiB), 839 MiB/1014 MiB Go alloc/total (heap fragmentation: 55 MiB, heap reserved: 87 MiB, heap released: 7.9 MiB), 129 MiB/134 MiB CGO alloc/total (282.7 CGO/sec), 99.5/0.3 %(u/s)time, 0.0 %gc (197x), 416 B/416 B (r/w)net ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ 93ad913106b6f0f6ec98bc2cfa788ff6d8085bd4:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m56s)

Stack:

goroutine 664159 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240520 16:48:21.404452 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r40/1:/Table/3{7-8}] 1622 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.404629 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r25/1:/Table/2{2-3}] 1623 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.404811 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r33/1:/NamespaceTable/{30-Max}] 1624 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.404996 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r11/1:/Table/{7-8}] 1625 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.405179 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r28/1:/Table/2{5-6}] 1626 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.405376 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r50/1:/Table/4{7-8}] 1627 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.405560 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r18/1:/Table/1{5-6}] 1628 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.405737 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r38/1:/Table/3{5-6}] 1629 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.405918 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r5/1:/{Systemtse-Table/0}] 1630 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.406109 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r35/1:/Table/3{2-3}] 1631 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.406295 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r60/1:/Table/{58-60}] 1632 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.406478 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r23/1:/Table/2{0-1}] 1633 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.406655 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r14/1:/Table/1{1-2}] 1634 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.406842 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r59/1:/Table/5{7-8}] 1635 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.407031 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r43/1:/Table/4{0-1}] 1636 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.407231 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r13/1:/Table/{9-11}] 1637 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.407429 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r4/1:/System{/tsd-tse}] 1638 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.407615 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r47/1:/Table/4{4-5}] 1639 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.407803 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r19/1:/Table/1{6-7}] 1640 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.407993 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r16/1:/Table/1{3-4}] 1641 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.408177 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r44/1:/Table/4{1-2}] 1642 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.408360 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r49/1:/Table/4{6-7}] 1643 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.408544 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r24/1:/Table/2{1-2}] 1644 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.408725 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r41/1:/Table/3{8-9}] 1645 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.408906 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r46/1:/Table/4{3-4}] 1646 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.409087 1951 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r1/1:/{Min-System/NodeL…}] 1647 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240520 16:48:21.430664 1800 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1648 gossip status (ok, 1 node) I240520 16:48:21.430664 1800 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1648 +gossip client (0/3 cur/max conns) I240520 16:48:21.430664 1800 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1648 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240520 16:48:21.736652 1803 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1649 ={"Timestamp":1716223701736646295,"EventType":"runtime_stats","MemRSSBytes":1002725376,"GoroutineCount":305,"MemStackSysBytes":3670016,"GoAllocBytes":610564680,"GoTotalBytes":736241064,"HeapFragmentBytes":47719864,"HeapReservedBytes":48848896,"HeapReleasedBytes":2113536,"CGoAllocBytes":135039904,"CGoTotalBytes":140455936,"CGoCallRate":209.35564,"CPUUserPercent":99.037094,"CPUSysPercent":0.3993431,"GCPausePercent":0.000651728,"GCRunCount":173,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240520 16:48:21.736916 1803 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1650 runtime stats: 956 MiB RSS, 305 goroutines (stacks: 3.5 MiB), 582 MiB/702 MiB Go alloc/total (heap fragmentation: 46 MiB, heap reserved: 47 MiB, heap released: 2.0 MiB), 129 MiB/134 MiB CGO alloc/total (209.4 CGO/sec), 99.0/0.4 %(u/s)time, 0.0 %gc (173x), 416 B/416 B (r/w)net W240520 16:48:29.933026 3200 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35441,class=default,rpc] 1651 latency jump (prev avg 74.00ms, current 129.91ms) I240520 16:48:31.796879 1803 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1652 ={"Timestamp":1716223711796874563,"EventType":"runtime_stats","MemRSSBytes":1072103424,"GoroutineCount":303,"MemStackSysBytes":3932160,"GoAllocBytes":654185336,"GoTotalBytes":803839432,"HeapFragmentBytes":52604040,"HeapReservedBytes":67543040,"HeapReleasedBytes":5955584,"CGoAllocBytes":135062432,"CGoTotalBytes":140648448,"CGoCallRate":364.70322,"CPUUserPercent":98.80485,"CPUSysPercent":0.39760503,"GCPausePercent":0.0010916644,"GCRunCount":174,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240520 16:48:31.797071 1803 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1653 runtime stats: 1022 MiB RSS, 303 goroutines (stacks: 3.8 MiB), 624 MiB/767 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 64 MiB, heap released: 5.7 MiB), 129 MiB/134 MiB CGO alloc/total (364.7 CGO/sec), 98.8/0.4 %(u/s)time, 0.0 %gc (174x), 416 B/416 B (r/w)net I240520 16:48:35.952512 1771 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r50/1:/Table/4{7-8},raft] 1654 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240520 16:48:38.586032 3213 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35441,class=system,rpc] 1655 latency jump (prev avg 77.64ms, current 117.97ms) I240520 16:48:41.737170 1803 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1656 ={"Timestamp":1716223721737165597,"EventType":"runtime_stats","MemRSSBytes":1072304128,"GoroutineCount":303,"MemStackSysBytes":3637248,"GoAllocBytes":704416040,"GoTotalBytes":803839432,"HeapFragmentBytes":47773400,"HeapReservedBytes":22437888,"HeapReleasedBytes":5955584,"CGoAllocBytes":135062432,"CGoTotalBytes":140677120,"CGoCallRate":218.60522,"CPUUserPercent":100.19826,"CPUSysPercent":0.30180198,"GCPausePercent":0.000028329145,"GCRunCount":175} I240520 16:48:41.737450 1803 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1657 runtime stats: 1023 MiB RSS, 303 goroutines (stacks: 3.5 MiB), 672 MiB/767 MiB Go alloc/total (heap fragmentation: 46 MiB, heap reserved: 21 MiB, heap released: 5.7 MiB), 129 MiB/134 MiB CGO alloc/total (218.6 CGO/sec), 100.2/0.3 %(u/s)time, 0.0 %gc (175x), 0 B/0 B (r/w)net I240520 16:48:42.948730 1771 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r35/1:/Table/3{2-3},raft] 1658 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240520 16:48:49.433842 1772 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r46/1:/Table/4{3-4},raft] 1659 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240520 16:48:51.737291 1803 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1660 ={"Timestamp":1716223731737285544,"EventType":"runtime_stats","MemRSSBytes":1121697792,"GoroutineCount":305,"MemStackSysBytes":3637248,"GoAllocBytes":700824880,"GoTotalBytes":853450184,"HeapFragmentBytes":48374480,"HeapReservedBytes":73990144,"HeapReleasedBytes":3530752,"CGoAllocBytes":135103392,"CGoTotalBytes":140529664,"CGoCallRate":152.4982,"CPUUserPercent":99.598816,"CPUSysPercent":0.29999644,"GCPausePercent":0.0004787143,"GCRunCount":176,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240520 16:48:51.737519 1803 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1661 runtime stats: 1.0 GiB RSS, 305 goroutines (stacks: 3.5 MiB), 668 MiB/814 MiB Go alloc/total (heap fragmentation: 46 MiB, heap reserved: 71 MiB, heap released: 3.4 MiB), 129 MiB/134 MiB CGO alloc/total (152.5 CGO/sec), 99.6/0.3 %(u/s)time, 0.0 %gc (176x), 416 B/416 B (r/w)net W240520 16:48:52.476668 3213 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35441,class=system,rpc] 1662 latency jump (prev avg 56.52ms, current 89.90ms) W240520 16:48:54.638017 3213 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35441,class=system,rpc] 1663 latency jump (prev avg 61.99ms, current 117.99ms) W240520 16:48:58.976664 3200 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35441,class=default,rpc] 1664 latency jump (prev avg 65.28ms, current 126.55ms) I240520 16:49:01.737200 1803 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1665 ={"Timestamp":1716223741737195100,"EventType":"runtime_stats","MemRSSBytes":1121693696,"GoroutineCount":304,"MemStackSysBytes":3506176,"GoAllocBytes":651651424,"GoTotalBytes":853450184,"HeapFragmentBytes":48453280,"HeapReservedBytes":123215872,"HeapReleasedBytes":3530752,"CGoAllocBytes":134992800,"CGoTotalBytes":140529664,"CGoCallRate":199.30177,"CPUUserPercent":99.80089,"CPUSysPercent":0.10000089,"GCPausePercent":0.0004838443,"GCRunCount":177,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240520 16:49:01.737445 1803 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1666 runtime stats: 1.0 GiB RSS, 304 goroutines (stacks: 3.3 MiB), 622 MiB/814 MiB Go alloc/total (heap fragmentation: 46 MiB, heap reserved: 118 MiB, heap released: 3.4 MiB), 129 MiB/134 MiB CGO alloc/total (199.3 CGO/sec), 99.8/0.1 %(u/s)time, 0.0 %gc (177x), 416 B/416 B (r/w)net I240520 16:49:11.737206 1803 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1667 ={"Timestamp":1716223751737201089,"EventType":"runtime_stats","MemRSSBytes":1081733120,"GoroutineCount":304,"MemStackSysBytes":3571712,"GoAllocBytes":655302608,"GoTotalBytes":813473224,"HeapFragmentBytes":49078320,"HeapReservedBytes":78569472,"HeapReleasedBytes":43835392,"CGoAllocBytes":135062432,"CGoTotalBytes":140533760,"CGoCallRate":346.0998,"CPUUserPercent":99.59994,"CPUSysPercent":0.2999998,"GCPausePercent":0.0004966397,"GCRunCount":178} I240520 16:49:11.737462 1803 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1668 runtime stats: 1.0 GiB RSS, 304 goroutines (stacks: 3.4 MiB), 625 MiB/776 MiB Go alloc/total (heap fragmentation: 47 MiB, heap reserved: 75 MiB, heap released: 42 MiB), 129 MiB/134 MiB CGO alloc/total (346.1 CGO/sec), 99.6/0.3 %(u/s)time, 0.0 %gc (178x), 0 B/0 B (r/w)net W240520 16:49:11.958030 3200 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35441,class=default,rpc] 1669 latency jump (prev avg 74.51ms, current 127.91ms) ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

celiala commented 3 months ago

Hi @fqazi - perhaps I can assign this to you to take a look? (git blame tells me you may have insight here)

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ d77b6e709e7d90625ff31bc0016eed8c7a6dc65a:

I240601 04:41:21.861560 2208 sql/catalog/lease/lease.go:1643  [T1,Vsystem,n1] 980  didnt refresh descriptor: 19 lease: node unavailable; try another peer
I240601 04:41:21.861586 2208 sql/catalog/lease/lease.go:1643  [T1,Vsystem,n1] 981  didnt refresh descriptor: 12 lease: node unavailable; try another peer
I240601 04:41:21.861632 2208 sql/catalog/lease/lease.go:1643  [T1,Vsystem,n1] 982  didnt refresh descriptor: 26 lease: node unavailable; try another peer
W240601 04:41:21.862110 880656 sql/catalog/lease/storage.go:340  [T1,Vsystem,n1] 983  error releasing lease ID=54 ver=1 expiration={2024-06-01 04:41:21.738624 +0000 UTC}: failed to delete lease: {[128] 54 1 1 {{738624000 63852813681 <nil>}} []}: result is ambiguous: server shutdown
I240601 04:41:21.862196 880656 sql/sqlliveness/slinstance/slinstance.go:472  [T1,Vsystem,n1] 984  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.862234 880656 sql/sqlliveness/slinstance/slinstance.go:482  [T1,Vsystem,n1] 985  enabling sqlliveness extension due to restored availability
E240601 04:41:21.862400 880655 kv/kvclient/kvcoord/txn_interceptor_committer.go:504  [-] 986  making txn commit explicit failed for "unnamed" meta={id=bbe1aae1 key=/Table/11/2/"\x80"/54/1/2024-06-01T04:41:21.838872Z/1/0 iso=Serializable pri=100.00000000 epo=0 ts=1717216881.838872055,0 min=1717216881.838872055,0 seq=2} lock=true stat=STAGING rts=1717216881.838872055,0 wto=false gul=1717216882.338872055,0 ifw=1: result is ambiguous: server shutdown
I240601 04:41:21.862870 880652 sql/sqlliveness/slinstance/slinstance.go:472  [n1] 987  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.862926 880652 sql/catalog/lease/storage.go:273  [n1] 988  retryable replica error occurred during lease acquisition for 6, retrying: result is ambiguous: server shutdown
I240601 04:41:21.863012 880652 sql/sqlliveness/slinstance/slinstance.go:482  [n1] 989  enabling sqlliveness extension due to restored availability
W240601 04:41:21.863461 880650 sql/catalog/lease/storage.go:340  [T1,Vsystem,n1] 990  error releasing lease ID=52 ver=1 expiration={2024-06-01 04:41:21.738101 +0000 UTC}: failed to delete lease: {[128] 52 1 1 {{738101000 63852813681 <nil>}} []}: result is ambiguous: server shutdown
I240601 04:41:21.863543 880650 sql/sqlliveness/slinstance/slinstance.go:472  [T1,Vsystem,n1] 991  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.863574 880650 sql/sqlliveness/slinstance/slinstance.go:482  [T1,Vsystem,n1] 992  enabling sqlliveness extension due to restored availability
E240601 04:41:21.863735 880649 kv/kvclient/kvcoord/txn_interceptor_committer.go:504  [-] 993  making txn commit explicit failed for "unnamed" meta={id=eaaca861 key=/Table/11/2/"\x80"/52/1/2024-06-01T04:41:21.838226Z/1/0 iso=Serializable pri=100.00000000 epo=0 ts=1717216881.838226020,0 min=1717216881.838226020,0 seq=2} lock=true stat=STAGING rts=1717216881.838226020,0 wto=false gul=1717216882.338226020,0 ifw=1: result is ambiguous: server shutdown
I240601 04:41:21.864095 880646 sql/sqlliveness/slinstance/slinstance.go:472  [n1] 994  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.864151 880646 sql/catalog/lease/storage.go:273  [n1] 995  retryable replica error occurred during lease acquisition for 42, retrying: result is ambiguous: server shutdown
I240601 04:41:21.864239 880646 sql/sqlliveness/slinstance/slinstance.go:482  [n1] 996  enabling sqlliveness extension due to restored availability
W240601 04:41:21.864682 880644 sql/catalog/lease/storage.go:340  [T1,Vsystem,n1] 997  error releasing lease ID=21 ver=1 expiration={2024-06-01 04:41:21.737401 +0000 UTC}: failed to delete lease: {[128] 21 1 1 {{737401000 63852813681 <nil>}} []}: result is ambiguous: server shutdown
I240601 04:41:21.864759 880644 sql/sqlliveness/slinstance/slinstance.go:472  [T1,Vsystem,n1] 998  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.864801 880644 sql/sqlliveness/slinstance/slinstance.go:482  [T1,Vsystem,n1] 999  enabling sqlliveness extension due to restored availability
E240601 04:41:21.864959 880643 kv/kvclient/kvcoord/txn_interceptor_committer.go:504  [-] 1000  making txn commit explicit failed for "unnamed" meta={id=db1aba74 key=/Table/11/2/"\x80"/21/1/2024-06-01T04:41:21.837563Z/1/0 iso=Serializable pri=100.00000000 epo=0 ts=1717216881.837562955,0 min=1717216881.837562955,0 seq=2} lock=true stat=STAGING rts=1717216881.837562955,0 wto=false gul=1717216882.337562955,0 ifw=1: result is ambiguous: server shutdown
W240601 04:41:21.865462 880638 sql/catalog/lease/storage.go:340  [T1,Vsystem,n1] 1001  error releasing lease ID=7 ver=1 expiration={2024-06-01 04:41:21.727217 +0000 UTC}: failed to delete lease: {[128] 7 1 1 {{727217000 63852813681 <nil>}} []}: result is ambiguous: server shutdown
I240601 04:41:21.865541 880638 sql/sqlliveness/slinstance/slinstance.go:472  [T1,Vsystem,n1] 1002  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.865575 880638 sql/sqlliveness/slinstance/slinstance.go:482  [T1,Vsystem,n1] 1003  enabling sqlliveness extension due to restored availability
E240601 04:41:21.865759 880637 kv/kvclient/kvcoord/txn_interceptor_committer.go:504  [-] 1004  making txn commit explicit failed for "unnamed" meta={id=7ed061b8 key=/Table/11/2/"\x80"/7/1/2024-06-01T04:41:21.837055Z/1/0 iso=Serializable pri=100.00000000 epo=0 ts=1717216881.837054887,0 min=1717216881.837054887,0 seq=2} lock=true stat=STAGING rts=1717216881.837054887,0 wto=false gul=1717216882.337054887,0 ifw=1: result is ambiguous: server shutdown
I240601 04:41:21.866212 880632 sql/sqlliveness/slinstance/slinstance.go:472  [n1,intExec=select-job] 1005  disabling sqlliveness extension because of availability issue on system tables
I240601 04:41:21.866268 880632 sql/catalog/lease/storage.go:273  [n1,intExec=select-job] 1006  retryable replica error occurred during lease acquisition for 53, retrying: result is ambiguous: server shutdown
I240601 04:41:21.866364 880632 sql/sqlliveness/slinstance/slinstance.go:482  [n1,intExec=select-job] 1007  enabling sqlliveness extension due to restored availability
W240601 04:41:21.866542 879896 kv/kvserver/intentresolver/intent_resolver.go:862  [-] 1008  failed to gc transaction record: could not GC completed transaction anchored at /Table/15/1/973688279753457665: node unavailable; try another peer
I240601 04:41:21.866816 4988 jobs/registry.go:1591  [T1,Vsystem,n1] 1009  KEY VISUALIZER job 100: stepping through state succeeded
I240601 04:41:21.867230 4992 jobs/registry.go:1591  [T1,Vsystem,n1] 1010  AUTO UPDATE SQL ACTIVITY job 103: stepping through state succeeded
I240601 04:41:21.867579 3097 sql/stats/automatic_stats.go:623  [T1,Vsystem,n1] 1011  quiescing stats garbage collector
I240601 04:41:21.867835 1713 server/start_listen.go:110  [T1,Vsystem,n1] 1012  server shutting down: instructing cmux to stop accepting
W240601 04:41:21.868211 1887 sql/sqlliveness/slinstance/slinstance.go:330  [T1,Vsystem,n1] 1013  exiting heartbeat loop
W240601 04:41:21.868262 1887 sql/sqlliveness/slinstance/slinstance.go:317  [T1,Vsystem,n1] 1014  exiting heartbeat loop with error: node unavailable; try another peer
E240601 04:41:21.868304 1887 server/server_sql.go:523  [T1,Vsystem,n1] 1015  failed to run update of instance with new session ID: node unavailable; try another peer
E240601 04:41:21.868624 4986 jobs/registry.go:900  [T1,Vsystem,n1] 1016  error getting live session: node unavailable; try another peer
I240601 04:41:21.869333 2635 jobs/registry.go:1591  [T1,Vsystem,n1] 1017  AUTO SPAN CONFIG RECONCILIATION job 973688277647982593: stepping through state succeeded
E240601 04:41:21.869858 880663 sql/catalog/lease/lease.go:1625  [T1,Vsystem,n1] 1018  refreshing descriptor: 45 lease failed: context canceled
E240601 04:41:21.869922 880639 sql/catalog/lease/lease.go:1625  [T1,Vsystem,n1] 1019  refreshing descriptor: 62 lease failed: context canceled
E240601 04:41:21.869963 880651 sql/catalog/lease/lease.go:1625  [T1,Vsystem,n1] 1020  refreshing descriptor: 6 lease failed: context canceled
E240601 04:41:21.870023 880645 sql/catalog/lease/lease.go:1625  [T1,Vsystem,n1] 1021  refreshing descriptor: 42 lease failed: context canceled
I240601 04:41:21.870529 3231 4@util/log/event_log.go:32  [T1,Vsystem,n1,client=127.0.0.1:40456,hostssl,user=root] 1022 ={"Timestamp":1717216881870526065,"EventType":"client_session_end","InstanceID":1,"Network":"tcp","RemoteAddress":"127.0.0.1:40456","SessionID":"17d4c74a9a0523350000000000000001","Duration":400709987871}
I240601 04:41:21.870772 3376 4@util/log/event_log.go:32  [T1,Vsystem,n1,client=127.0.0.1:40462,hostssl,user=root] 1023 ={"Timestamp":1717216881870770620,"EventType":"client_session_end","InstanceID":1,"Network":"tcp","RemoteAddress":"127.0.0.1:40462","SessionID":"17d4c74a9ee982050000000000000001","Duration":400628157120}
W240601 04:41:21.870883 4988 jobs/adopt.go:495  [T1,Vsystem,n1] 1024  could not clear job claim: clear-job-claim: node unavailable; try another peer
W240601 04:41:21.870969 4992 jobs/adopt.go:495  [T1,Vsystem,n1] 1025  could not clear job claim: clear-job-claim: node unavailable; try another peer
I240601 04:41:21.878638 1642 server/server_controller.go:413  [T1,Vsystem,n1] 1026  server controller shutting down
I240601 04:41:21.878702 1642 server/server_controller.go:422  [T1,Vsystem,n1] 1027  waiting for tenant servers to report stopped
W240601 04:41:21.878790 1642 server/server_sql.go:1721  [T1,Vsystem,n1] 1028  server shutdown without a prior graceful drain
--- FAIL: TestLeasingClusterVersionStarvation (401.33s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ 4c06ddd826bd32f84adbdf2f78f90c7d0c2f2d60:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m56s)

Stack:

goroutine 757131 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240605 05:19:34.310610 2068 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r1/1:/{Min-System/NodeL…}] 1672 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240605 05:19:34.310780 2068 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r62/1:/Table/6{0-2}] 1673 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240605 05:19:34.310946 2068 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r31/1:/Table/2{8-9}] 1674 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240605 05:19:34.311109 2068 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r32/1:/{Table/29-NamespaceTab…}] 1675 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240605 05:19:34.311278 2068 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r28/1:/Table/2{5-6}] 1676 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240605 05:19:34.311442 2068 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r47/1:/Table/4{4-5}] 1677 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240605 05:19:34.324165 1755 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1678 gossip status (ok, 1 node) I240605 05:19:34.324165 1755 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1678 +gossip client (0/3 cur/max conns) I240605 05:19:34.324165 1755 2@gossip/gossip.go:531 [T1,Vsystem,n1] 1678 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240605 05:19:34.641579 1758 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1679 ={"Timestamp":1717564774641574741,"EventType":"runtime_stats","MemRSSBytes":1069309952,"GoroutineCount":296,"MemStackSysBytes":3112960,"GoAllocBytes":689803800,"GoTotalBytes":790917560,"HeapFragmentBytes":52702696,"HeapReservedBytes":18415616,"HeapReleasedBytes":11796480,"CGoAllocBytes":135051904,"CGoTotalBytes":140591104,"CGoCallRate":280.49832,"CPUUserPercent":99.5994,"CPUSysPercent":0.1999988,"GCPausePercent":0.0004838371,"GCRunCount":163,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240605 05:19:34.641810 1758 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1680 runtime stats: 1020 MiB RSS, 296 goroutines (stacks: 3.0 MiB), 658 MiB/754 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 18 MiB, heap released: 11 MiB), 129 MiB/134 MiB CGO alloc/total (280.5 CGO/sec), 99.6/0.2 %(u/s)time, 0.0 %gc (163x), 416 B/416 B (r/w)net W240605 05:19:34.875732 3188 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=system,rpc] 1681 latency jump (prev avg 76.22ms, current 117.16ms) W240605 05:19:42.363200 3188 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=system,rpc] 1682 latency jump (prev avg 60.17ms, current 166.31ms) I240605 05:19:43.333456 1745 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r46/1:/Table/4{3-4},raft] 1683 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240605 05:19:44.641514 1758 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1684 ={"Timestamp":1717564784641509076,"EventType":"runtime_stats","MemRSSBytes":1071149056,"GoroutineCount":295,"MemStackSysBytes":3014656,"GoAllocBytes":688401472,"GoTotalBytes":793104824,"HeapFragmentBytes":49525696,"HeapReservedBytes":25280512,"HeapReleasedBytes":9609216,"CGoAllocBytes":135031424,"CGoTotalBytes":140591104,"CGoCallRate":382.9025,"CPUUserPercent":99.80065,"GCPausePercent":0.00040192262,"GCRunCount":164,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240605 05:19:44.641717 1758 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1685 runtime stats: 1022 MiB RSS, 295 goroutines (stacks: 2.9 MiB), 656 MiB/756 MiB Go alloc/total (heap fragmentation: 47 MiB, heap reserved: 24 MiB, heap released: 9.2 MiB), 129 MiB/134 MiB CGO alloc/total (382.9 CGO/sec), 99.8/0.0 %(u/s)time, 0.0 %gc (164x), 416 B/416 B (r/w)net W240605 05:19:46.105033 3175 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=default,rpc] 1686 latency jump (prev avg 78.39ms, current 128.93ms) W240605 05:19:47.905030 3188 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=system,rpc] 1687 latency jump (prev avg 81.67ms, current 123.81ms) I240605 05:19:49.884474 1738 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r38/1:/Table/3{5-6},raft] 1688 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240605 05:19:54.359328 3188 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=system,rpc] 1689 latency jump (prev avg 60.38ms, current 161.02ms) I240605 05:19:54.641585 1758 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1690 ={"Timestamp":1717564794641580951,"EventType":"runtime_stats","MemRSSBytes":1071550464,"GoroutineCount":295,"MemStackSysBytes":3768320,"GoAllocBytes":688512880,"GoTotalBytes":793629112,"HeapFragmentBytes":54870160,"HeapReservedBytes":19595264,"HeapReleasedBytes":9084928,"CGoAllocBytes":135031424,"CGoTotalBytes":140541952,"CGoCallRate":238.0983,"CPUUserPercent":99.59928,"CPUSysPercent":0.29999784,"GCPausePercent":0.00036863735,"GCRunCount":164} I240605 05:19:54.641811 1758 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1691 runtime stats: 1022 MiB RSS, 295 goroutines (stacks: 3.6 MiB), 657 MiB/757 MiB Go alloc/total (heap fragmentation: 52 MiB, heap reserved: 19 MiB, heap released: 8.7 MiB), 129 MiB/134 MiB CGO alloc/total (238.1 CGO/sec), 99.6/0.3 %(u/s)time, 0.0 %gc (164x), 0 B/0 B (r/w)net W240605 05:19:56.623685 3188 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=system,rpc] 1692 latency jump (prev avg 78.45ms, current 156.80ms) I240605 05:19:56.838258 1745 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r13/1:/Table/{9-11},raft] 1693 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240605 05:19:58.042762 3175 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=default,rpc] 1694 latency jump (prev avg 81.10ms, current 149.86ms) I240605 05:20:03.327146 1738 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r16/1:/Table/1{3-4},raft] 1695 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240605 05:20:04.648307 1758 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1696 ={"Timestamp":1717564804648302105,"EventType":"runtime_stats","MemRSSBytes":1074270208,"GoroutineCount":295,"MemStackSysBytes":3604480,"GoAllocBytes":681244528,"GoTotalBytes":794710456,"HeapFragmentBytes":55797904,"HeapReservedBytes":27181056,"HeapReleasedBytes":8003584,"CGoAllocBytes":135017088,"CGoTotalBytes":140541952,"CGoCallRate":225.54842,"CPUUserPercent":99.63304,"CPUSysPercent":0.19986568,"GCPausePercent":0.00003325765,"GCRunCount":165,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240605 05:20:04.648501 1758 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1697 runtime stats: 1.0 GiB RSS, 295 goroutines (stacks: 3.4 MiB), 650 MiB/758 MiB Go alloc/total (heap fragmentation: 53 MiB, heap reserved: 26 MiB, heap released: 7.6 MiB), 129 MiB/134 MiB CGO alloc/total (225.5 CGO/sec), 99.6/0.2 %(u/s)time, 0.0 %gc (165x), 416 B/416 B (r/w)net W240605 05:20:07.320730 3188 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=system,rpc] 1698 latency jump (prev avg 79.90ms, current 154.84ms) I240605 05:20:09.348375 1742 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r6/1:/Table/{0-4},raft] 1699 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240605 05:20:14.648279 1758 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1700 ={"Timestamp":1717564814648274325,"EventType":"runtime_stats","MemRSSBytes":1074393088,"GoroutineCount":296,"MemStackSysBytes":3637248,"GoAllocBytes":663492232,"GoTotalBytes":794710456,"HeapFragmentBytes":55757176,"HeapReservedBytes":44941312,"HeapReleasedBytes":8003584,"CGoAllocBytes":135017088,"CGoTotalBytes":140541952,"CGoCallRate":233.70064,"CPUUserPercent":99.80027,"CPUSysPercent":0.20000054,"GCPausePercent":0.00048384132,"GCRunCount":166,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240605 05:20:14.648460 1758 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1701 runtime stats: 1.0 GiB RSS, 296 goroutines (stacks: 3.5 MiB), 633 MiB/758 MiB Go alloc/total (heap fragmentation: 53 MiB, heap reserved: 43 MiB, heap released: 7.6 MiB), 129 MiB/134 MiB CGO alloc/total (233.7 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (166x), 416 B/416 B (r/w)net I240605 05:20:15.840124 1743 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r38/1:/Table/3{5-6},raft] 1702 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240605 05:20:16.501881 3175 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=default,rpc] 1703 latency jump (prev avg 65.37ms, current 104.99ms) I240605 05:20:22.333349 1740 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r46/1:/Table/4{3-4},raft] 1704 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240605 05:20:24.637815 1744 3@pebble/event.go:828 [n1,s1,pebble] 1705 [JOB 42] WAL created 000038 (recycled 000019) I240605 05:20:24.639817 756888 3@pebble/event.go:780 [n1,s1,pebble] 1706 [JOB 43] flushing 1 memtable (64MB) to L0 I240605 05:20:24.640023 756888 3@pebble/event.go:808 [n1,s1,pebble] 1707 [JOB 43] flushing: sstable created 000039 I240605 05:20:24.655099 1758 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1708 ={"Timestamp":1717564824655093934,"EventType":"runtime_stats","MemRSSBytes":1074290688,"GoroutineCount":300,"MemStackSysBytes":3571712,"GoAllocBytes":664248144,"GoTotalBytes":794710456,"HeapFragmentBytes":55320752,"HeapReservedBytes":44687360,"HeapReleasedBytes":8003584,"CGoAllocBytes":134984320,"CGoTotalBytes":140541952,"CGoCallRate":247.13147,"CPUUserPercent":99.73199,"CPUSysPercent":0.19986372,"GCPausePercent":0.00048351026,"GCRunCount":167} I240605 05:20:24.655285 1758 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1709 runtime stats: 1.0 GiB RSS, 300 goroutines (stacks: 3.4 MiB), 634 MiB/758 MiB Go alloc/total (heap fragmentation: 53 MiB, heap reserved: 43 MiB, heap released: 7.6 MiB), 129 MiB/134 MiB CGO alloc/total (247.1 CGO/sec), 99.7/0.2 %(u/s)time, 0.0 %gc (167x), 0 B/0 B (r/w)net I240605 05:20:25.030181 756888 3@pebble/event.go:784 [n1,s1,pebble] 1710 [JOB 43] flushed 1 memtable (64MB) to L0 [000039] (926KB), in 0.4s (0.4s total), output rate 2.3MB/s I240605 05:20:25.030366 756896 3@pebble/event.go:768 [n1,s1,pebble] 1711 [JOB 44] compacting(default) L0 [000039] (926KB) Score=15.92 + L6 [000037] (4.0MB) Score=0.06; OverlappingRatio: Single 4.45, Multi 0.00 I240605 05:20:25.030709 756896 3@pebble/event.go:808 [n1,s1,pebble] 1712 [JOB 44] compacting: sstable created 000040 W240605 05:20:25.137524 3175 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=default,rpc] 1713 latency jump (prev avg 66.31ms, current 136.63ms) I240605 05:20:25.285338 756896 3@pebble/event.go:808 [n1,s1,pebble] 1714 [JOB 44] compacting: sstable created 000041 I240605 05:20:25.289422 756896 3@pebble/event.go:772 [n1,s1,pebble] 1715 [JOB 44] compacted(default) L0 [000039] (926KB) Score=15.92 + L6 [000037] (4.0MB) Score=0.06 -> L6 [000040 000041] (4.4MB), in 0.3s (0.3s total), output rate 17MB/s I240605 05:20:25.289601 1643 3@pebble/event.go:812 [n1,s1,pebble] 1716 [JOB 44] sstable deleted 000037 I240605 05:20:25.289717 1643 3@pebble/event.go:812 [n1,s1,pebble] 1717 [JOB 44] sstable deleted 000039 I240605 05:20:29.375138 1740 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r23/1:/Table/2{0-1},raft] 1718 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240605 05:20:29.404961 3175 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:42687,class=default,rpc] 1719 latency jump (prev avg 63.24ms, current 99.92ms) ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ d0582983521533ffc15a8b8525a188d80da2a0b7:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m56s)

Stack:

goroutine 646848 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240606 06:44:20.847382 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r20/1:/Table/1{7-8}] 1676 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.847578 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r4/1:/System{/tsd-tse}] 1677 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.847747 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r64/1:/{Table/62-Max}] 1678 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.847913 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r26/1:/Table/2{3-4}] 1679 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.848077 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r39/1:/Table/3{6-7}] 1680 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.848240 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r17/1:/Table/1{4-5}] 1681 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.848403 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r13/1:/Table/{9-11}] 1682 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.848573 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r31/1:/Table/2{8-9}] 1683 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.848738 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r34/1:/{NamespaceTab…-Table/32}] 1684 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.848901 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r18/1:/Table/1{5-6}] 1685 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.849066 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r9/1:/Table/{5-6}] 1686 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.849234 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r60/1:/Table/{58-60}] 1687 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.849445 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r15/1:/Table/1{2-3}] 1688 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.849653 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r3/1:/System/{NodeLive…-tsd}] 1689 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.849820 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r33/1:/NamespaceTable/{30-Max}] 1690 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240606 06:44:20.849984 1967 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r24/1:/Table/2{1-2}] 1691 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster W240606 06:44:20.894939 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1692 latency jump (prev avg 89.19ms, current 150.63ms) I240606 06:44:21.157433 1790 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1693 ={"Timestamp":1717656261157427903,"EventType":"runtime_stats","MemRSSBytes":999587840,"GoroutineCount":306,"MemStackSysBytes":3801088,"GoAllocBytes":602961656,"GoTotalBytes":720455080,"HeapFragmentBytes":49727752,"HeapReservedBytes":38789120,"HeapReleasedBytes":860160,"CGoAllocBytes":134999136,"CGoTotalBytes":140513280,"CGoCallRate":262.03912,"CPUUserPercent":99.47689,"CPUSysPercent":0.19995356,"GCPausePercent":0.00048372764,"GCRunCount":165,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240606 06:44:21.157620 1790 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1694 runtime stats: 953 MiB RSS, 306 goroutines (stacks: 3.6 MiB), 575 MiB/687 MiB Go alloc/total (heap fragmentation: 47 MiB, heap reserved: 37 MiB, heap released: 840 KiB), 129 MiB/134 MiB CGO alloc/total (262.0 CGO/sec), 99.5/0.2 %(u/s)time, 0.0 %gc (165x), 416 B/416 B (r/w)net W240606 06:44:29.523669 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1695 latency jump (prev avg 78.64ms, current 121.70ms) I240606 06:44:31.157538 1790 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1696 ={"Timestamp":1717656271157532036,"EventType":"runtime_stats","MemRSSBytes":1068867584,"GoroutineCount":305,"MemStackSysBytes":3768320,"GoAllocBytes":577707664,"GoTotalBytes":787783112,"HeapFragmentBytes":45629808,"HeapReservedBytes":135364608,"HeapReleasedBytes":4972544,"CGoAllocBytes":134984800,"CGoTotalBytes":140492800,"CGoCallRate":346.49643,"CPUUserPercent":99.698975,"CPUSysPercent":0.2999969,"GCPausePercent":0.000483835,"GCRunCount":166,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240606 06:44:31.157733 1790 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1697 runtime stats: 1019 MiB RSS, 305 goroutines (stacks: 3.6 MiB), 551 MiB/751 MiB Go alloc/total (heap fragmentation: 44 MiB, heap reserved: 129 MiB, heap released: 4.7 MiB), 129 MiB/134 MiB CGO alloc/total (346.5 CGO/sec), 99.7/0.3 %(u/s)time, 0.0 %gc (166x), 416 B/416 B (r/w)net W240606 06:44:31.772940 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1698 latency jump (prev avg 80.82ms, current 167.14ms) W240606 06:44:34.105044 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1699 latency jump (prev avg 76.93ms, current 141.28ms) W240606 06:44:36.130808 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1700 latency jump (prev avg 81.52ms, current 145.99ms) W240606 06:44:37.423002 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1701 latency jump (prev avg 85.26ms, current 182.08ms) W240606 06:44:40.691532 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1702 latency jump (prev avg 79.26ms, current 148.73ms) I240606 06:44:41.157499 1790 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1703 ={"Timestamp":1717656281157493375,"EventType":"runtime_stats","MemRSSBytes":1068793856,"GoroutineCount":305,"MemStackSysBytes":3670016,"GoAllocBytes":595917904,"GoTotalBytes":787783112,"HeapFragmentBytes":52257712,"HeapReservedBytes":110624768,"HeapReleasedBytes":4972544,"CGoAllocBytes":134952032,"CGoTotalBytes":140419072,"CGoCallRate":193.70074,"CPUUserPercent":99.60038,"CPUSysPercent":0.20000078,"GCPausePercent":0.00048384187,"GCRunCount":167} I240606 06:44:41.157686 1790 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1704 runtime stats: 1019 MiB RSS, 305 goroutines (stacks: 3.5 MiB), 568 MiB/751 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 106 MiB, heap released: 4.7 MiB), 129 MiB/134 MiB CGO alloc/total (193.7 CGO/sec), 99.6/0.2 %(u/s)time, 0.0 %gc (167x), 0 B/0 B (r/w)net I240606 06:44:41.205046 1764 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r23/1:/Table/2{0-1},raft] 1705 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240606 06:44:43.959083 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1706 latency jump (prev avg 76.35ms, current 116.91ms) I240606 06:44:48.183248 1762 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r39/1:/Table/3{6-7},raft] 1707 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240606 06:44:49.323793 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1708 latency jump (prev avg 63.64ms, current 141.16ms) I240606 06:44:51.157474 1790 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1709 ={"Timestamp":1717656291157469127,"EventType":"runtime_stats","MemRSSBytes":1069117440,"GoroutineCount":305,"MemStackSysBytes":3702784,"GoAllocBytes":600984960,"GoTotalBytes":788045256,"HeapFragmentBytes":50573952,"HeapReservedBytes":107208704,"HeapReleasedBytes":4972544,"CGoAllocBytes":134984800,"CGoTotalBytes":140587008,"CGoCallRate":223.30052,"CPUUserPercent":99.80023,"CPUSysPercent":0.20000048,"GCPausePercent":0.0005708813,"GCRunCount":168,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240606 06:44:51.157652 1790 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1710 runtime stats: 1020 MiB RSS, 305 goroutines (stacks: 3.5 MiB), 573 MiB/752 MiB Go alloc/total (heap fragmentation: 48 MiB, heap reserved: 102 MiB, heap released: 4.7 MiB), 129 MiB/134 MiB CGO alloc/total (223.3 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (168x), 416 B/416 B (r/w)net W240606 06:44:54.717800 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1711 latency jump (prev avg 61.64ms, current 112.77ms) I240606 06:44:55.180328 1764 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r25/1:/Table/2{2-3},raft] 1712 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240606 06:44:58.020215 3194 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=system,rpc] 1713 latency jump (prev avg 61.09ms, current 149.41ms) I240606 06:45:01.170526 1790 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1714 ={"Timestamp":1717656301170521901,"EventType":"runtime_stats","MemRSSBytes":1069293568,"GoroutineCount":306,"MemStackSysBytes":3801088,"GoAllocBytes":605694168,"GoTotalBytes":788045256,"HeapFragmentBytes":49461032,"HeapReservedBytes":103514112,"HeapReleasedBytes":4972544,"CGoAllocBytes":134888544,"CGoTotalBytes":140587008,"CGoCallRate":245.77919,"CPUUserPercent":99.76977,"CPUSysPercent":0.19973929,"GCPausePercent":0.00040138606,"GCRunCount":169,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240606 06:45:01.170696 1790 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1715 runtime stats: 1020 MiB RSS, 306 goroutines (stacks: 3.6 MiB), 578 MiB/752 MiB Go alloc/total (heap fragmentation: 47 MiB, heap reserved: 99 MiB, heap released: 4.7 MiB), 129 MiB/134 MiB CGO alloc/total (245.8 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (169x), 416 B/416 B (r/w)net I240606 06:45:01.680486 1764 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r31/1:/Table/2{8-9},raft] 1716 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240606 06:45:02.698228 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1717 latency jump (prev avg 65.72ms, current 114.43ms) W240606 06:45:04.883794 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1718 latency jump (prev avg 65.97ms, current 110.87ms) W240606 06:45:08.072794 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1719 latency jump (prev avg 64.12ms, current 109.01ms) I240606 06:45:11.170605 1790 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1720 ={"Timestamp":1717656311170599056,"EventType":"runtime_stats","MemRSSBytes":1069162496,"GoroutineCount":306,"MemStackSysBytes":3670016,"GoAllocBytes":601583880,"GoTotalBytes":788045256,"HeapFragmentBytes":49671928,"HeapReservedBytes":107544576,"HeapReleasedBytes":4972544,"CGoAllocBytes":134902880,"CGoTotalBytes":140468224,"CGoCallRate":212.89838,"CPUUserPercent":99.699234,"CPUSysPercent":0.099999234,"GCPausePercent":0.00040192695,"GCRunCount":170} I240606 06:45:11.170779 1790 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1721 runtime stats: 1020 MiB RSS, 306 goroutines (stacks: 3.5 MiB), 574 MiB/752 MiB Go alloc/total (heap fragmentation: 47 MiB, heap reserved: 103 MiB, heap released: 4.7 MiB), 129 MiB/134 MiB CGO alloc/total (212.9 CGO/sec), 99.7/0.1 %(u/s)time, 0.0 %gc (170x), 0 B/0 B (r/w)net W240606 06:45:11.385052 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1722 latency jump (prev avg 65.39ms, current 145.59ms) W240606 06:45:14.605024 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1723 latency jump (prev avg 67.10ms, current 108.20ms) I240606 06:45:14.736147 1761 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r46/1:/Table/4{3-4},raft] 1724 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240606 06:45:15.742800 3181 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41273,class=default,rpc] 1725 latency jump (prev avg 70.00ms, current 137.65ms) ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ c7f37e813dbd1e6d2afdb88c5e9cd986dc061299:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m56s)

Stack:

goroutine 692519 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240609 06:18:18.695837 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r46/1:/Table/4{3-4}] 1683 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.696017 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r33/1:/NamespaceTable/{30-Max}] 1684 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.696211 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r50/1:/Table/4{7-8}] 1685 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.696385 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r41/1:/Table/3{8-9}] 1686 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.696562 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r20/1:/Table/1{7-8}] 1687 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.696739 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r27/1:/Table/2{4-5}] 1688 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.696918 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r19/1:/Table/1{6-7}] 1689 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.697102 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r37/1:/Table/3{4-5}] 1690 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.730878 1768 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1691 gossip status (ok, 1 node) I240609 06:18:18.730878 1768 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1691 +gossip client (0/3 cur/max conns) I240609 06:18:18.730878 1768 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1691 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240609 06:18:18.777534 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r30/1:/Table/2{7-8}] 1692 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.787991 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r16/1:/Table/1{3-4}] 1693 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.788309 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r3/1:/System/{NodeLive…-tsd}] 1694 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.788529 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r5/1:/{Systemtse-Table/0}] 1695 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.788724 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r47/1:/Table/4{4-5}] 1696 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.788915 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r2/1:/System/NodeLiveness{-Max}] 1697 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.789111 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r9/1:/Table/{5-6}] 1698 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240609 06:18:18.789298 1919 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r44/1:/Table/4{1-2}] 1699 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster W240609 06:18:18.827519 3142 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=default,rpc] 1700 latency jump (prev avg 64.94ms, current 160.34ms) I240609 06:18:19.040248 1771 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1701 ={"Timestamp":1717913899040243547,"EventType":"runtime_stats","MemRSSBytes":1210474496,"GoroutineCount":294,"MemStackSysBytes":3801088,"GoAllocBytes":777438984,"GoTotalBytes":957219320,"HeapFragmentBytes":52893944,"HeapReservedBytes":94486528,"HeapReleasedBytes":82092032,"CGoAllocBytes":134962080,"CGoTotalBytes":143753216,"CGoCallRate":212.1003,"CPUUserPercent":99.50014,"CPUSysPercent":0.20000029,"GCPausePercent":0.00045055064,"GCRunCount":161,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240609 06:18:19.040479 1771 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1702 runtime stats: 1.1 GiB RSS, 294 goroutines (stacks: 3.6 MiB), 741 MiB/913 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 90 MiB, heap released: 78 MiB), 129 MiB/137 MiB CGO alloc/total (212.1 CGO/sec), 99.5/0.2 %(u/s)time, 0.0 %gc (161x), 416 B/416 B (r/w)net W240609 06:18:22.047435 3155 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=system,rpc] 1703 latency jump (prev avg 70.37ms, current 115.81ms) I240609 06:18:29.053150 1771 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1704 ={"Timestamp":1717913909053144923,"EventType":"runtime_stats","MemRSSBytes":1197047808,"GoroutineCount":294,"MemStackSysBytes":3899392,"GoAllocBytes":761562656,"GoTotalBytes":944636408,"HeapFragmentBytes":44128736,"HeapReservedBytes":106446848,"HeapReleasedBytes":94674944,"CGoAllocBytes":134947744,"CGoTotalBytes":142807040,"CGoCallRate":363.23138,"CPUUserPercent":99.471664,"CPUSysPercent":0.39948463,"GCPausePercent":0.000038360507,"GCRunCount":162,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240609 06:18:29.053346 1771 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1705 runtime stats: 1.1 GiB RSS, 294 goroutines (stacks: 3.7 MiB), 726 MiB/901 MiB Go alloc/total (heap fragmentation: 42 MiB, heap reserved: 102 MiB, heap released: 90 MiB), 129 MiB/136 MiB CGO alloc/total (363.2 CGO/sec), 99.5/0.4 %(u/s)time, 0.0 %gc (162x), 416 B/416 B (r/w)net W240609 06:18:30.615454 3142 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=default,rpc] 1706 latency jump (prev avg 70.79ms, current 107.44ms) W240609 06:18:34.888506 3155 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=system,rpc] 1707 latency jump (prev avg 65.21ms, current 100.87ms) I240609 06:18:39.053117 1771 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1708 ={"Timestamp":1717913919053113332,"EventType":"runtime_stats","MemRSSBytes":1197056000,"GoroutineCount":294,"MemStackSysBytes":3604480,"GoAllocBytes":779548872,"GoTotalBytes":944701944,"HeapFragmentBytes":44517176,"HeapReservedBytes":88367104,"HeapReleasedBytes":94674944,"CGoAllocBytes":135091104,"CGoTotalBytes":142807040,"CGoCallRate":462.10147,"CPUUserPercent":99.70032,"CPUSysPercent":0.20000064,"GCPausePercent":0.0004966316,"GCRunCount":163} I240609 06:18:39.053313 1771 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1709 runtime stats: 1.1 GiB RSS, 294 goroutines (stacks: 3.4 MiB), 743 MiB/901 MiB Go alloc/total (heap fragmentation: 42 MiB, heap reserved: 84 MiB, heap released: 90 MiB), 129 MiB/136 MiB CGO alloc/total (462.1 CGO/sec), 99.7/0.2 %(u/s)time, 0.0 %gc (163x), 0 B/0 B (r/w)net W240609 06:18:40.379510 3155 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=system,rpc] 1710 latency jump (prev avg 73.52ms, current 128.10ms) I240609 06:18:48.262133 691041 3@pebble/event.go:828 [n1,s1,pebble] 1711 [JOB 38] WAL created 000035 (recycled 000015) I240609 06:18:48.262302 691451 3@pebble/event.go:780 [n1,s1,pebble] 1712 [JOB 39] flushing 1 memtable (5.2MB) to L0 I240609 06:18:48.262423 691451 3@pebble/event.go:808 [n1,s1,pebble] 1713 [JOB 39] flushing: sstable created 000036 I240609 06:18:48.277566 691451 3@pebble/event.go:784 [n1,s1,pebble] 1714 [JOB 39] flushed 1 memtable (5.2MB) to L0 [000036] (250KB), in 0.0s (0.0s total), output rate 16MB/s I240609 06:18:48.277706 691455 3@pebble/event.go:768 [n1,s1,pebble] 1715 [JOB 40] compacting(default) L0 [000036] (250KB) Score=15.85 + L6 [000034] (4.0MB) Score=0.06; OverlappingRatio: Single 16.54, Multi 0.00 I240609 06:18:48.278646 691455 3@pebble/event.go:808 [n1,s1,pebble] 1716 [JOB 40] compacting: sstable created 000037 I240609 06:18:48.488625 691455 3@pebble/event.go:772 [n1,s1,pebble] 1717 [JOB 40] compacted(default) L0 [000036] (250KB) Score=15.85 + L6 [000034] (4.0MB) Score=0.06 -> L6 [000037] (2.1MB), in 0.2s (0.2s total), output rate 10MB/s I240609 06:18:48.488885 1637 3@pebble/event.go:812 [n1,s1,pebble] 1718 [JOB 40] sstable deleted 000034 I240609 06:18:48.488976 1637 3@pebble/event.go:812 [n1,s1,pebble] 1719 [JOB 40] sstable deleted 000036 I240609 06:18:49.053140 1771 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1720 ={"Timestamp":1717913929053136007,"EventType":"runtime_stats","MemRSSBytes":1193762816,"GoroutineCount":293,"MemStackSysBytes":3375104,"GoAllocBytes":782558472,"GoTotalBytes":940507640,"HeapFragmentBytes":43760376,"HeapReservedBytes":82149376,"HeapReleasedBytes":98869248,"CGoAllocBytes":135016032,"CGoTotalBytes":143937536,"CGoCallRate":235.69945,"CPUUserPercent":99.79977,"CPUSysPercent":0.19999953,"GCPausePercent":0.00040191904,"GCRunCount":164,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240609 06:18:49.091157 1771 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1721 runtime stats: 1.1 GiB RSS, 293 goroutines (stacks: 3.2 MiB), 746 MiB/897 MiB Go alloc/total (heap fragmentation: 42 MiB, heap reserved: 78 MiB, heap released: 94 MiB), 129 MiB/137 MiB CGO alloc/total (235.7 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (164x), 416 B/416 B (r/w)net W240609 06:18:54.404452 3155 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=system,rpc] 1722 latency jump (prev avg 77.02ms, current 120.40ms) I240609 06:18:59.053224 1771 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1723 ={"Timestamp":1717913939053218239,"EventType":"runtime_stats","MemRSSBytes":1192370176,"GoroutineCount":294,"MemStackSysBytes":3637248,"GoAllocBytes":750427688,"GoTotalBytes":940507640,"HeapFragmentBytes":45523416,"HeapReservedBytes":112254976,"HeapReleasedBytes":98869248,"CGoAllocBytes":135006816,"CGoTotalBytes":142495744,"CGoCallRate":202.49837,"CPUUserPercent":99.5992,"CPUSysPercent":0.29999757,"GCPausePercent":0.00057855534,"GCRunCount":165,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240609 06:18:59.053428 1771 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1724 runtime stats: 1.1 GiB RSS, 294 goroutines (stacks: 3.5 MiB), 716 MiB/897 MiB Go alloc/total (heap fragmentation: 43 MiB, heap reserved: 107 MiB, heap released: 94 MiB), 129 MiB/136 MiB CGO alloc/total (202.5 CGO/sec), 99.6/0.3 %(u/s)time, 0.0 %gc (165x), 416 B/416 B (r/w)net I240609 06:19:00.263957 1744 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r20/1:/Table/1{7-8},raft] 1725 rate limited in MaybeAdd (merge): throttled on async limiting semaphore W240609 06:19:06.462084 3155 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:41151,class=system,rpc] 1726 latency jump (prev avg 78.36ms, current 196.50ms) I240609 06:19:06.736510 1740 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r41/1:/Table/3{8-9},raft] 1727 rate limited in MaybeAdd (merge): throttled on async limiting semaphore I240609 06:19:09.053128 1771 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1728 ={"Timestamp":1717913949053124219,"EventType":"runtime_stats","MemRSSBytes":1192337408,"GoroutineCount":294,"MemStackSysBytes":3440640,"GoAllocBytes":769508880,"GoTotalBytes":940507640,"HeapFragmentBytes":53098992,"HeapReservedBytes":85794816,"HeapReleasedBytes":98869248,"CGoAllocBytes":135006816,"CGoTotalBytes":142475264,"CGoCallRate":143.30133,"CPUUserPercent":99.500916,"CPUSysPercent":0.4000037,"GCPausePercent":0.00041472385,"GCRunCount":166} I240609 06:19:09.053341 1771 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1729 runtime stats: 1.1 GiB RSS, 294 goroutines (stacks: 3.3 MiB), 734 MiB/897 MiB Go alloc/total (heap fragmentation: 51 MiB, heap reserved: 82 MiB, heap released: 94 MiB), 129 MiB/136 MiB CGO alloc/total (143.3 CGO/sec), 99.5/0.4 %(u/s)time, 0.0 %gc (166x), 0 B/0 B (r/w)net I240609 06:19:13.246469 1738 kv/kvserver/queue.go:621 [T1,Vsystem,n1,s1,r17/1:/Table/1{4-5},raft] 1730 rate limited in MaybeAdd (merge): throttled on async limiting semaphore ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ bd4c5fd7c65c0a9afa8ba05675e49747a4f68881:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m56s)

Stack:

goroutine 678817 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240610 07:09:34.861370 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r9/1:/Table/{5-6}] 1661 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.861533 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r29/1:/Table/2{6-7}] 1662 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.861693 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r46/1:/Table/4{3-4}] 1663 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.861851 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r38/1:/Table/3{5-6}] 1664 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.862003 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r18/1:/Table/1{5-6}] 1665 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.862174 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r3/1:/System/{NodeLive…-tsd}] 1666 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.862340 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r64/1:/{Table/62-Max}] 1667 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.862499 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r35/1:/Table/3{2-3}] 1668 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.862678 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r43/1:/Table/4{0-1}] 1669 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.862837 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r60/1:/Table/{58-60}] 1670 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863004 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r40/1:/Table/3{7-8}] 1671 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863160 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r32/1:/{Table/29-NamespaceTab…}] 1672 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863318 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r34/1:/{NamespaceTab…-Table/32}] 1673 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863474 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r49/1:/Table/4{6-7}] 1674 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863627 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r39/1:/Table/3{6-7}] 1675 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863779 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r31/1:/Table/2{8-9}] 1676 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.863931 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r20/1:/Table/1{7-8}] 1677 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.864085 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r5/1:/{Systemtse-Table/0}] 1678 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.864253 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r10/1:/Table/{6-7}] 1679 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.864408 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r14/1:/Table/1{1-2}] 1680 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.864562 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r33/1:/NamespaceTable/{30-Max}] 1681 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.864716 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r22/1:/Table/{19-20}] 1682 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.864870 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r36/1:/Table/3{3-4}] 1683 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.865024 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r51/1:/Table/{48-57}] 1684 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.865180 2121 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r17/1:/Table/1{4-5}] 1685 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240610 07:09:34.875290 1753 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1686 gossip status (ok, 1 node) I240610 07:09:34.875290 1753 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1686 +gossip client (0/3 cur/max conns) I240610 07:09:34.875290 1753 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1686 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240610 07:09:35.364912 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1687 ={"Timestamp":1718003375364909473,"EventType":"runtime_stats","MemRSSBytes":1048440832,"GoroutineCount":296,"MemStackSysBytes":3440640,"GoAllocBytes":667202976,"GoTotalBytes":800538056,"HeapFragmentBytes":52341344,"HeapReservedBytes":51388416,"HeapReleasedBytes":22429696,"CGoAllocBytes":135001088,"CGoTotalBytes":140652544,"CGoCallRate":259.88608,"CPUUserPercent":99.46504,"CPUSysPercent":0.19952866,"GCPausePercent":0.0002937062,"GCRunCount":178,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240610 07:09:35.365044 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1688 runtime stats: 1000 MiB RSS, 296 goroutines (stacks: 3.3 MiB), 636 MiB/764 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 49 MiB, heap released: 21 MiB), 129 MiB/134 MiB CGO alloc/total (259.9 CGO/sec), 99.5/0.2 %(u/s)time, 0.0 %gc (178x), 416 B/416 B (r/w)net W240610 07:09:41.705019 3147 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=system,rpc] 1689 latency jump (prev avg 69.32ms, current 200.01ms) I240610 07:09:45.364859 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1690 ={"Timestamp":1718003385364855503,"EventType":"runtime_stats","MemRSSBytes":1048539136,"GoroutineCount":295,"MemStackSysBytes":3473408,"GoAllocBytes":661464512,"GoTotalBytes":800538056,"HeapFragmentBytes":48183872,"HeapReservedBytes":61251584,"HeapReleasedBytes":22429696,"CGoAllocBytes":134986752,"CGoTotalBytes":140791808,"CGoCallRate":348.10187,"CPUUserPercent":99.80054,"CPUSysPercent":0.10000054,"GCPausePercent":0.00028928154,"GCRunCount":179,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240610 07:09:45.365005 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1691 runtime stats: 1000 MiB RSS, 295 goroutines (stacks: 3.3 MiB), 631 MiB/764 MiB Go alloc/total (heap fragmentation: 46 MiB, heap reserved: 58 MiB, heap released: 21 MiB), 129 MiB/134 MiB CGO alloc/total (348.1 CGO/sec), 99.8/0.1 %(u/s)time, 0.0 %gc (179x), 416 B/416 B (r/w)net W240610 07:09:49.036772 3134 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=default,rpc] 1692 latency jump (prev avg 66.15ms, current 163.99ms) I240610 07:09:55.364893 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1693 ={"Timestamp":1718003395364889877,"EventType":"runtime_stats","MemRSSBytes":1048535040,"GoroutineCount":295,"MemStackSysBytes":3440640,"GoAllocBytes":663127760,"GoTotalBytes":800538056,"HeapFragmentBytes":52140336,"HeapReservedBytes":55664640,"HeapReleasedBytes":22429696,"CGoAllocBytes":134976512,"CGoTotalBytes":140791808,"CGoCallRate":223.29924,"CPUUserPercent":99.89966,"CPUSysPercent":0.09999966,"GCPausePercent":0.00028927904,"GCRunCount":180} I240610 07:09:55.365039 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1694 runtime stats: 1000 MiB RSS, 295 goroutines (stacks: 3.3 MiB), 632 MiB/764 MiB Go alloc/total (heap fragmentation: 50 MiB, heap reserved: 53 MiB, heap released: 21 MiB), 129 MiB/134 MiB CGO alloc/total (223.3 CGO/sec), 99.9/0.1 %(u/s)time, 0.0 %gc (180x), 0 B/0 B (r/w)net W240610 07:09:58.910771 3147 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=system,rpc] 1695 latency jump (prev avg 61.22ms, current 107.64ms) I240610 07:10:05.364901 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1696 ={"Timestamp":1718003405364898075,"EventType":"runtime_stats","MemRSSBytes":1048489984,"GoroutineCount":295,"MemStackSysBytes":3538944,"GoAllocBytes":662308216,"GoTotalBytes":800538056,"HeapFragmentBytes":51165832,"HeapReservedBytes":57360384,"HeapReleasedBytes":22429696,"CGoAllocBytes":135001088,"CGoTotalBytes":140738560,"CGoCallRate":298.09976,"CPUUserPercent":99.79992,"CPUSysPercent":0.19999982,"GCPausePercent":0.00029439974,"GCRunCount":181,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240610 07:10:05.365048 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1697 runtime stats: 1000 MiB RSS, 295 goroutines (stacks: 3.4 MiB), 632 MiB/764 MiB Go alloc/total (heap fragmentation: 49 MiB, heap reserved: 55 MiB, heap released: 21 MiB), 129 MiB/134 MiB CGO alloc/total (298.1 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (181x), 416 B/416 B (r/w)net W240610 07:10:06.582643 3147 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=system,rpc] 1698 latency jump (prev avg 74.97ms, current 126.86ms) W240610 07:10:08.454004 3134 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=default,rpc] 1699 latency jump (prev avg 76.61ms, current 157.03ms) I240610 07:10:15.364903 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1700 ={"Timestamp":1718003415364899667,"EventType":"runtime_stats","MemRSSBytes":1048489984,"GoroutineCount":296,"MemStackSysBytes":3473408,"GoAllocBytes":661476664,"GoTotalBytes":800538056,"HeapFragmentBytes":50375368,"HeapReservedBytes":59047936,"HeapReleasedBytes":22429696,"CGoAllocBytes":134976512,"CGoTotalBytes":140738560,"CGoCallRate":266.89993,"CPUUserPercent":99.69998,"CPUSysPercent":0.09999998,"GCPausePercent":0.00028927994,"GCRunCount":182,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240610 07:10:15.365055 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1701 runtime stats: 1000 MiB RSS, 296 goroutines (stacks: 3.3 MiB), 631 MiB/764 MiB Go alloc/total (heap fragmentation: 48 MiB, heap reserved: 56 MiB, heap released: 21 MiB), 129 MiB/134 MiB CGO alloc/total (266.9 CGO/sec), 99.7/0.1 %(u/s)time, 0.0 %gc (182x), 416 B/416 B (r/w)net W240610 07:10:16.258444 3147 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=system,rpc] 1702 latency jump (prev avg 72.64ms, current 156.37ms) W240610 07:10:17.378774 3147 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=system,rpc] 1703 latency jump (prev avg 77.73ms, current 120.22ms) W240610 07:10:20.355763 3134 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=default,rpc] 1704 latency jump (prev avg 78.78ms, current 122.90ms) W240610 07:10:24.661698 3134 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=default,rpc] 1705 latency jump (prev avg 71.33ms, current 126.78ms) I240610 07:10:25.364931 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1706 ={"Timestamp":1718003425364927425,"EventType":"runtime_stats","MemRSSBytes":1048498176,"GoroutineCount":296,"MemStackSysBytes":3506176,"GoAllocBytes":653063664,"GoTotalBytes":800538056,"HeapFragmentBytes":49646096,"HeapReservedBytes":68157440,"HeapReleasedBytes":22429696,"CGoAllocBytes":134986752,"CGoTotalBytes":140738560,"CGoCallRate":228.09937,"CPUUserPercent":99.79972,"CPUSysPercent":0.19999945,"GCPausePercent":0.0002943992,"GCRunCount":183} I240610 07:10:25.365076 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1707 runtime stats: 1000 MiB RSS, 296 goroutines (stacks: 3.3 MiB), 623 MiB/764 MiB Go alloc/total (heap fragmentation: 47 MiB, heap reserved: 65 MiB, heap released: 21 MiB), 129 MiB/134 MiB CGO alloc/total (228.1 CGO/sec), 99.8/0.2 %(u/s)time, 0.0 %gc (183x), 0 B/0 B (r/w)net W240610 07:10:28.092769 3147 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:39797,class=system,rpc] 1708 latency jump (prev avg 72.88ms, current 122.01ms) ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

celiala commented 2 months ago

Closing this backport, since we don't need to backport upgrade-prs to older branches given that:

new upgrade policy is from 24.2 onward
all backports have already been first reviewed on the master branch
cockroach-teamcity commented 2 months ago

upgrade/upgrades.TestLeasingClusterVersionStarvation failed on master @ 9d3dd0530818b9c5c0f7dac04e8a8de4acf1bba4:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestLeasingClusterVersionStarvation (14m56s)

Stack:

goroutine 1740981 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` I240613 07:11:11.449465 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r49/1:/Table/4{6-7}] 1569 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.449634 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r10/1:/Table/{6-7}] 1570 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.449806 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r51/1:/Table/{48-57}] 1571 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.449972 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r41/1:/Table/3{8-9}] 1572 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.450142 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r27/1:/Table/2{4-5}] 1573 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.450307 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r8/1:/Table/{4-5}] 1574 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.450473 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r38/1:/Table/3{5-6}] 1575 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.450637 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r43/1:/Table/4{0-1}] 1576 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.450802 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r47/1:/Table/4{4-5}] 1577 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.450971 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r23/1:/Table/2{0-1}] 1578 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.451148 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r48/1:/Table/4{5-6}] 1579 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.451335 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r32/1:/{Table/29-NamespaceTab…}] 1580 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.451506 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r40/1:/Table/3{7-8}] 1581 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.451673 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r60/1:/Table/{58-60}] 1582 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.451842 1912 13@kv/kvserver/replicate_queue.go:767 [T1,Vsystem,n1,replicate,s1,r9/1:/Table/{5-6}] 1583 error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster I240613 07:11:11.481458 1753 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1584 gossip status (ok, 1 node) I240613 07:11:11.481458 1753 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1584 +gossip client (0/3 cur/max conns) I240613 07:11:11.481458 1753 2@gossip/gossip.go:555 [T1,Vsystem,n1] 1584 +gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received) I240613 07:11:11.640734 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1585 ={"Timestamp":1718262671640731205,"EventType":"runtime_stats","MemRSSBytes":1507545088,"GoroutineCount":323,"MemStackSysBytes":3932160,"GoAllocBytes":1064610288,"GoTotalBytes":1228925512,"HeapFragmentBytes":15037968,"HeapReservedBytes":120676352,"HeapReleasedBytes":112640000,"CGoAllocBytes":135044352,"CGoTotalBytes":140476416,"CGoCallRate":80906.81,"CPUUserPercent":89.2421,"CPUSysPercent":10.393257,"GCPausePercent":0.00047840964,"GCRunCount":185,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240613 07:11:11.640918 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1586 runtime stats: 1.4 GiB RSS, 323 goroutines (stacks: 3.8 MiB), 1015 MiB/1.1 GiB Go alloc/total (heap fragmentation: 14 MiB, heap reserved: 115 MiB, heap released: 107 MiB), 129 MiB/134 MiB CGO alloc/total (80906.8 CGO/sec), 89.2/10.4 %(u/s)time, 0.0 %gc (185x), 416 B/416 B (r/w)net I240613 07:11:21.233026 2192 jobs/adopt.go:108 [T1,Vsystem,n1] 1587 claimed 1 jobs I240613 07:11:21.641001 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1588 ={"Timestamp":1718262681640996949,"EventType":"runtime_stats","MemRSSBytes":1505083392,"GoroutineCount":316,"MemStackSysBytes":3932160,"GoAllocBytes":1072239512,"GoTotalBytes":1226541640,"HeapFragmentBytes":16641128,"HeapReservedBytes":108470272,"HeapReleasedBytes":115613696,"CGoAllocBytes":135044352,"CGoTotalBytes":140627968,"CGoCallRate":80133.58,"CPUUserPercent":90.19761,"CPUSysPercent":9.599745,"GCPausePercent":0.000793579,"GCRunCount":187,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240613 07:11:21.641207 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1589 runtime stats: 1.4 GiB RSS, 316 goroutines (stacks: 3.8 MiB), 1023 MiB/1.1 GiB Go alloc/total (heap fragmentation: 16 MiB, heap reserved: 103 MiB, heap released: 110 MiB), 129 MiB/134 MiB CGO alloc/total (80133.6 CGO/sec), 90.2/9.6 %(u/s)time, 0.0 %gc (187x), 416 B/416 B (r/w)net I240613 07:11:31.641098 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1590 ={"Timestamp":1718262691641093648,"EventType":"runtime_stats","MemRSSBytes":1505046528,"GoroutineCount":318,"MemStackSysBytes":4227072,"GoAllocBytes":1067378344,"GoTotalBytes":1226541640,"HeapFragmentBytes":15104344,"HeapReservedBytes":114573312,"HeapReleasedBytes":115613696,"CGoAllocBytes":135052544,"CGoTotalBytes":140554240,"CGoCallRate":79819.33,"CPUUserPercent":89.59914,"CPUSysPercent":10.2999,"GCPausePercent":0.00048383532,"GCRunCount":188} I240613 07:11:31.641305 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1591 runtime stats: 1.4 GiB RSS, 318 goroutines (stacks: 4.0 MiB), 1018 MiB/1.1 GiB Go alloc/total (heap fragmentation: 14 MiB, heap reserved: 109 MiB, heap released: 110 MiB), 129 MiB/134 MiB CGO alloc/total (79819.3 CGO/sec), 89.6/10.3 %(u/s)time, 0.0 %gc (188x), 0 B/0 B (r/w)net I240613 07:11:41.641201 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1592 ={"Timestamp":1718262701641197659,"EventType":"runtime_stats","MemRSSBytes":1500827648,"GoroutineCount":315,"MemStackSysBytes":3702784,"GoAllocBytes":1070893296,"GoTotalBytes":1222347336,"HeapFragmentBytes":16840464,"HeapReservedBytes":105652224,"HeapReleasedBytes":119808000,"CGoAllocBytes":135044352,"CGoTotalBytes":140525568,"CGoCallRate":78167.3,"CPUUserPercent":89.79908,"CPUSysPercent":10.099896,"GCPausePercent":0.000880631,"GCRunCount":190,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240613 07:11:41.641395 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1593 runtime stats: 1.4 GiB RSS, 315 goroutines (stacks: 3.5 MiB), 1021 MiB/1.1 GiB Go alloc/total (heap fragmentation: 16 MiB, heap reserved: 101 MiB, heap released: 114 MiB), 129 MiB/134 MiB CGO alloc/total (78167.3 CGO/sec), 89.8/10.1 %(u/s)time, 0.0 %gc (190x), 416 B/416 B (r/w)net I240613 07:11:51.641255 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1594 ={"Timestamp":1718262711641249972,"EventType":"runtime_stats","MemRSSBytes":1502212096,"GoroutineCount":319,"MemStackSysBytes":3833856,"GoAllocBytes":1065076784,"GoTotalBytes":1222347336,"HeapFragmentBytes":15046608,"HeapReservedBytes":113131520,"HeapReleasedBytes":119808000,"CGoAllocBytes":135027968,"CGoTotalBytes":140406784,"CGoCallRate":77950.49,"CPUUserPercent":89.29953,"CPUSysPercent":10.599944,"GCPausePercent":0.00048895745,"GCRunCount":191,"NetHostRecvBytes":416,"NetHostSendBytes":416} I240613 07:11:51.641500 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1595 runtime stats: 1.4 GiB RSS, 319 goroutines (stacks: 3.7 MiB), 1016 MiB/1.1 GiB Go alloc/total (heap fragmentation: 14 MiB, heap reserved: 108 MiB, heap released: 114 MiB), 129 MiB/134 MiB CGO alloc/total (77950.5 CGO/sec), 89.3/10.6 %(u/s)time, 0.0 %gc (191x), 416 B/416 B (r/w)net I240613 07:11:52.276123 1722 3@pebble/event.go:828 [n1,s1,pebble] 1596 [JOB 54] WAL created 000059 (recycled 000026) I240613 07:11:52.276694 1709755 3@pebble/event.go:780 [n1,s1,pebble] 1597 [JOB 55] flushing 1 memtable (64MB) to L0 I240613 07:11:52.276826 1709755 3@pebble/event.go:808 [n1,s1,pebble] 1598 [JOB 55] flushing: sstable created 000060 W240613 07:11:52.512602 3139 2@rpc/clock_offset.go:291 [T1,Vsystem,n1,rnode=1,raddr=127.0.0.1:35343,class=system,rpc] 1599 latency jump (prev avg 54.21ms, current 190.58ms) I240613 07:11:52.672235 1709755 3@pebble/event.go:808 [n1,s1,pebble] 1600 [JOB 55] flushing: sstable created 000061 I240613 07:11:52.719997 1709755 3@pebble/event.go:784 [n1,s1,pebble] 1601 [JOB 55] flushed 1 memtable (64MB) to L0 [000060 000061] (2.8MB), in 0.4s (0.4s total), output rate 6.3MB/s I240613 07:11:52.722932 1710601 3@pebble/event.go:768 [n1,s1,pebble] 1602 [JOB 57] compacting(default) L0 [000060] (2.3MB) Score=7.81 + L6 [000056] (4.1MB) Score=0.13; OverlappingRatio: Single 1.83, Multi 0.00 I240613 07:11:52.723441 1710601 3@pebble/event.go:808 [n1,s1,pebble] 1603 [JOB 57] compacting: sstable created 000062 I240613 07:11:53.249995 1710601 3@pebble/event.go:808 [n1,s1,pebble] 1604 [JOB 57] compacting: sstable created 000063 I240613 07:11:53.259930 1710601 3@pebble/event.go:772 [n1,s1,pebble] 1605 [JOB 57] compacted(default) L0 [000060] (2.3MB) Score=7.81 + L6 [000056] (4.1MB) Score=0.13 -> L6 [000062 000063] (4.5MB), in 0.5s (0.5s total), output rate 8.4MB/s I240613 07:11:53.261655 1622 3@pebble/event.go:812 [n1,s1,pebble] 1606 [JOB 57] sstable deleted 000056 I240613 07:11:53.261780 1622 3@pebble/event.go:812 [n1,s1,pebble] 1607 [JOB 57] sstable deleted 000060 I240613 07:11:53.261861 1711215 3@pebble/event.go:768 [n1,s1,pebble] 1608 [JOB 58] compacting(default) L0 [000061] (564KB) Score=7.46 + L6 [000057 000058] (4.1MB) Score=0.13; OverlappingRatio: Single 7.39, Multi 0.00 I240613 07:11:53.262232 1711215 3@pebble/event.go:808 [n1,s1,pebble] 1609 [JOB 58] compacting: sstable created 000064 I240613 07:11:53.668334 1711215 3@pebble/event.go:808 [n1,s1,pebble] 1610 [JOB 58] compacting: sstable created 000065 I240613 07:11:53.684167 1711215 3@pebble/event.go:772 [n1,s1,pebble] 1611 [JOB 58] compacted(default) L0 [000061] (564KB) Score=7.46 + L6 [000057 000058] (4.1MB) Score=0.13 -> L6 [000064 000065] (4.5MB), in 0.4s (0.4s total), output rate 11MB/s I240613 07:11:53.684400 1622 3@pebble/event.go:812 [n1,s1,pebble] 1612 [JOB 58] sstable deleted 000057 I240613 07:11:53.684492 1622 3@pebble/event.go:812 [n1,s1,pebble] 1613 [JOB 58] sstable deleted 000058 I240613 07:11:53.684531 1622 3@pebble/event.go:812 [n1,s1,pebble] 1614 [JOB 58] sstable deleted 000061 I240613 07:12:01.646477 1756 2@util/log/event_log.go:32 [T1,Vsystem,n1] 1615 ={"Timestamp":1718262721646472857,"EventType":"runtime_stats","MemRSSBytes":1519538176,"GoroutineCount":319,"MemStackSysBytes":4227072,"GoAllocBytes":1075632768,"GoTotalBytes":1239345736,"HeapFragmentBytes":14386560,"HeapReservedBytes":119840768,"HeapReleasedBytes":102809600,"CGoAllocBytes":135032064,"CGoTotalBytes":140861440,"CGoCallRate":52458.895,"CPUUserPercent":81.25755,"CPUSysPercent":18.690235,"GCPausePercent":0.0009902028,"GCRunCount":193} I240613 07:12:01.646677 1756 2@server/status/runtime_log.go:47 [T1,Vsystem,n1] 1616 runtime stats: 1.4 GiB RSS, 319 goroutines (stacks: 4.0 MiB), 1.0 GiB/1.2 GiB Go alloc/total (heap fragmentation: 14 MiB, heap reserved: 114 MiB, heap released: 98 MiB), 129 MiB/134 MiB CGO alloc/total (52458.9 CGO/sec), 81.3/18.7 %(u/s)time, 0.0 %gc (193x), 0 B/0 B (r/w)net ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

fqazi commented 2 months ago

So, the failure here is linked with the system.lease migration, digging into this further:

E240613 06:57:12.321379 3832 sql/catalog/lease/lease.go:1625  [T1,Vsystem,n1] 281  refreshing descriptor: 11 lease failed: failed to insert lease {[128] 11 2 1 {{316886000 63853858632 <nil>}} [1 1 128 207 76 4 32 57 168 77 255 150 233 76 183 56 144 94 21]}: unexpected value: raw_bytes:"\273\223#\365\n3\002" timestamp:<wall_time:1718261832301161961 > 
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282  job 977113685260828673: running execution encountered retriable error: non-cancelable: count-leases: failed to insert lease {[128] 11 2 1 {{316886000 63853858632 <nil>}} [1 1 128 207 76 4 32 57 168 77 255 150 233 76 183 56 144 94 21]}: unexpected value: raw_bytes:"\273\223#\365\n3\002" timestamp:<wall_time:1718261832301161961 > 
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +(1) attached stack trace
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  -- stack trace:
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1670
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:451
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:285
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:485
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Wraps: (2) non-cancelable
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Wraps: (3) attached stack trace
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  -- stack trace:
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).execInternal.func1.1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1162
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*rowsIterator).Next.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:508
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*rowsIterator).Next
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:567
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).queryInternalBuffered
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:681
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).QueryRowExWithCols
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:733
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).QueryRowEx
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:719
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.countLeasesNonMultiRegion
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/count.go:173
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.CountLeases.func1.1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/count.go:134
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).WithSyntheticDescriptors
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:162
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.CountLeases.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/count.go:125
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1830
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1917
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/txn.go:1049
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1103
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1066
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1041
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1904
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1831
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.CountLeases
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/count.go:107
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.CheckTwoVersionInvariant
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/txn.go:96
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).checkDescriptorTwoVersionInvariant
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1431
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).commitSQLTransactionInternal
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1606
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).commitTxn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1363
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).newInternalExecutorWithTxn.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1791
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1931
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/txn.go:1049
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1103
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1066
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1041
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1904
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).DescsTxn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1818
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Wraps: (4) count-leases
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Wraps: (5) attached stack trace
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  -- stack trace:
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*kvWriter).insertLease
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/kv_writer.go:95
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/storage.go:237
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/txn.go:1049
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1103
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1066
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/kv/db.go:1041
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/storage.go:262
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:651
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall.func1
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:389
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:320
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:388
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  | runtime.goexit
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +  |    src/runtime/asm_amd64.s:1695
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Wraps: (6) failed to insert lease {[128] 11 2 1 {{316886000 63853858632 <nil>}} [1 1 128 207 76 4 32 57 168 77 255 150 233 76 183 56 144 94 21]}
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Wraps: (7) unexpected value: raw_bytes:"\273\223#\365\n3\002" timestamp:<wall_time:1718261832301161961 > 
E240613 06:57:12.323020 3697 jobs/registry.go:1672  [T1,Vsystem,n1] 282 +Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *kvpb.ConditionFailedError
E240613 06:57:12.323639 3697 jobs/adopt.go:456  [T1,Vsystem,n1] 283  job 977113685260828673: adoption completed with error non-cancelable: count-leases: failed to insert lease {[128] 11 2 1 {{316886000 63853858632 <nil>}} [1 1 128 207 76 4 32 57 168 77 255 150 233 76 183 56 144 94 21]}: unexpected value: raw_bytes:"\273\223#\365\n3\002" timestamp:<wall_time:1718261832301161961 > 
I240613 06:57:13.619745 1694 1@gossip/gossip.go:1398  [T1,Vsystem,n1] 284  node has connected to cluster via gossip
I240613 06:57:13.619985 1694 kv/kvserver/stores.go:283  [T1,Vsystem,n1] 285  wrote 0 node addresses to persistent storage
rafiss commented 2 months ago

Hm I wonder why this is only failing on master though. 24.1 has that migration too.

fqazi commented 2 months ago

Yeah, its easy to repro on stress, adding logging now to figure out where the duplicate entry is generated.

fqazi commented 2 months ago

Digging into this further this is the range feed and on demand renewal racing against each other. Its made worse on this test because the lease expiry is set to 0. With some debug logging I see the rangefeed:

Accquiring from: goroutine 837 [running]:
runtime/debug.Stack()
        GOROOT/src/runtime/debug/stack.go:24 +0x64
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease({0x107b2cd48, 0x140024bfad0}, 0x14004873200, 0xb, 0x1)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:622 +0x60
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*Manager).AcquireFreshestFromStore(0x14004873200, {0x107b2cd48, 0x140024bfad0}, 0xb)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:594 +0x5c
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.ensureVersion({0x107b2cd48, 0x140024bfad0}, 0xb, 0x2, 0x14004873200)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:297 +0x6c
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.purgeOldVersions({0x107b2cd48, 0x140024bfad0}, 0x14003aefc98?, 0xb, 0x0, 0x2, 0x14004873200)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:807 +0xc4
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*Manager).RefreshLeases.func1.2({0x107b2cd48, 0x140024bfad0})
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:1380 +0x8c
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*Manager).RefreshLeases.func1({0x107b2cd48, 0x140024bfad0})
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:1406 +0x4b0
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
        github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:485 +0x128
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx in goroutine 13
        github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:476 +0x31c
I240613 20:46:16.225819 2333 sql/catalog/lease/kv_writer.go:84  [n1] 289  MARKED WRITING: 11 2

and the lease query during migration:

Accquiring from: goroutine 2189 [running]:
runtime/debug.Stack()
        GOROOT/src/runtime/debug/stack.go:24 +0x64
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease({0x107b2cd48, 0x140068b9110}, 0x14004873200, 0xb, 0x0)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:622 +0x60
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*Manager).Acquire.func1(0x14006158420, {0x107b2cd48, 0x140068b9110}, 0x14004873200, 0xb)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:1248 +0x88
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*Manager).Acquire(0x14004873200, {0x107b2cd48, 0x140068b9110}, {0x1d00000001?, 0x40545a3?}, 0xb)
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:1250 +0xb4
github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.(*Manager).AcquireByName(0x14004873200, {0x107b2cd48, 0x140068b9110}, {0x10bbf85e0?, 0x5395d48?}, 0x1, 0x1d, {0x140040545a3, 0x5})
        github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:1095 +0x290
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.(*leasedDescriptors).getByName(0x1400513a028, {0x107b2cd48, 0x140068b9110}, {0x107af4cf0, 0x14005493360}, 0x1, 0x1d, {0x140040545a3, 0x5})
        github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/leased_descriptors.go:106 +0x1ac
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.(*Collection).getNonVirtualDescriptorID.func6()
fqazi commented 2 months ago

Actually still need to dig more, since we should join the single flight started by the range feed or see a copy of the descriptor with a good session ID and skip a session based leasing insert.

fqazi commented 2 months ago

This is limited to only cases where the lease renewal time is artificially tiny, where in dual write mode the new version of the lease will instantly expire on the system.lease table while doing the migration. Have to think about how to handle this more gracefully during the transition.

fqazi commented 2 months ago

Actually the count-leases thing is a red herring, since on a retry the lease should eventually be removed. The scenario here is:

  1. New version of the lease descriptor is published
  2. Range feed picks up the new version, renews it and expires it instantly (lease duration is 0)
  3. The lease counting query attempts to lease this descriptor
  4. We attempt to write an entry into the system.lease table, while (2) is attempting to delete the entry. This leads to the CPut inside leasing failing

Normally if (4) happens on retry eventually (2) would complete cleaning up the lease. But, I see a bunch of stack stuck in releaseLease which makes me wonder if some has gone awry (i.e. we can't clean up the lease)

rafiss commented 2 months ago

Is the lease duration of 0 required for this test? I see why the lease renewal duration should be 0, but why the lease duration itself?

fqazi commented 2 months ago

@rafiss Good point, and I don't think the duration itself has to be zero either. Let me get a PR to change that, also the leasing code no longer watches the settings table so the original scenario isn't there, but probably this is a good sanity test for other bugs.