stability: `(*Store).mu` permanently unavailable on a node

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

29.85k stars 3.77k forks source link

stability: `(*Store).mu` permanently unavailable on a node #9749

Closed tbg closed 7 years ago

tbg commented 7 years ago

goroutine 25 [semacquire, 14 minutes]:
sync.runtime_Semacquire(0xc42013feac)
        /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*Mutex).Lock(0xc42013fea8)
        /usr/local/go/src/sync/mutex.go:85 +0xd0
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Lock(0xc42031b150)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:63 +0x30
github.com/cockroachdb/cockroach/storage.(*Store).tryGetOrCreateReplica(0xc42031af00, 0x259e, 0xc400000001, 0xc42d142c88, 0x8, 0xc42002cc00, 0xc42033c000, 0x1960248)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2903 +0x54
github.com/cockroachdb/cockroach/storage.(*Store).getOrCreateReplica(0xc42031af00, 0x259e, 0x1, 0xc42d142c88, 0xc421180380, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2880 +0x51
github.com/cockroachdb/cockroach/storage.(*Store).processRaftRequest(0xc42031af00, 0x7f8acde4a0b8, 0xc42b4ae030, 0xc42d142c80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2346 +0x9a
github.com/cockroachdb/cockroach/storage.(*Store).processRequestQueue(0xc42031af00, 0x259e)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2763 +0x180
github.com/cockroachdb/cockroach/storage.(*raftScheduler).worker(0xc4204072d0, 0xc420456090)
        /go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:204 +0x308
github.com/cockroachdb/cockroach/storage.(*raftScheduler).Start.func1()
        /go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:160 +0x33
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc420456090, 0xc420652e80)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:191 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:192 +0x66

This is of course accompanied by thousands of other goroutines which have the same problem.

Will investigate further.

cc @BramGruneir @andreimatei

tbg commented 7 years ago

This one looks suspicious. The code holds the Store lock while doing things with individual replicas (which can easily be locked for a while when they're not available). This is also a lock ordering violation (cc @arjunravinarayan interestingly exactly the one I mentioned in #9380 😄 )

goroutine218 [semacquire, 14 minutes]:
sync.runtime_Semacquire(0xc425d4ebcc)
        /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*Mutex).Lock(0xc425d4ebc8)
        /usr/local/go/src/sync/mutex.go:85 +0xd0
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Lock(0xc4218a4b20)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:63 +0x30
github.com/cockroachdb/cockroach/storage.(*Replica).Desc(0xc4218a4a80, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:872 +0x46
github.com/cockroachdb/cockroach/storage.(*Replica).endKey(0xc4218a4a80, 0x1, 0x30, 0x2206ae0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:3504 +0x2b
github.com/cockroachdb/cockroach/storage.(*Replica).Less(0xc4218a4a80, 0x220f9a0, 0xc445024e20, 0xc43063bd00)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:3509 +0x2f
github.com/google/btree.(*node).iterate(0xc42044e600, 0x1, 0x220f9a0, 0xc445024e20, 0x0, 0x0, 0x101, 0xc424145a48, 0x0)
        /go/src/github.com/google/btree/btree.go:441 +0x49a
github.com/google/btree.(*node).iterate(0xc4206479c0, 0x1, 0x220f9a0, 0xc445024e20, 0x0, 0x0, 0xc445020001, 0xc424145a48, 0x1681820)
        /go/src/github.com/google/btree/btree.go:445 +0x558
github.com/google/btree.(*BTree).AscendGreaterOrEqual(0xc42037b020, 0x220f9a0, 0xc445024e20, 0xc424145a48)
        /go/src/github.com/google/btree/btree.go:633 +0x74
github.com/cockroachdb/cockroach/storage.(*Store).visitReplicasLocked(0xc42031af00, 0x25b4900, 0x0, 0x0, 0x2206ae0, 0x2, 0x2, 0xc424145af0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1343 +0x11e
github.com/cockroachdb/cockroach/storage.(*storeRangeSet).Visit(0xc424145b58, 0xc424145b38)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:239 +0x124
github.com/cockroachdb/cockroach/storage.(*Store).deadReplicas(0xc42031af00, 0xc445024e00, 0xc42064b528, 0x1663f00, 0x800000000000030)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1923 +0xb9
github.com/cockroachdb/cockroach/storage.(*Store).GossipDeadReplicas(0xc42031af00, 0x7f8acde4a0b8, 0xc4204354a0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1204 +0x49
github.com/cockroachdb/cockroach/server.(*Node).gossipStores.func1(0xc42031af00, 0xc4203eb9e0, 0xc424145cb0)
        /go/src/github.com/cockroachdb/cockroach/server/node.go:623 +0x7c
github.com/cockroachdb/cockroach/storage.(*Stores).VisitStores(0xc4203df5e0, 0xc424145d58, 0xc424145d70, 0x62e03c)
        /go/src/github.com/cockroachdb/cockroach/storage/stores.go:130 +0x219
github.com/cockroachdb/cockroach/server.(*Node).gossipStores(0xc42018c000, 0x7f8acde4a0b8, 0xc4204354a0)
        /go/src/github.com/cockroachdb/cockroach/server/node.go:627 +0x72
github.com/cockroachdb/cockroach/server.(*Node).startGossip.func1()
        /go/src/github.com/cockroachdb/cockroach/server/node.go:605 +0x310
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc420456090, 0xc42084f140)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:191 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:192 +0x66

tbg commented 7 years ago

That Replica is being GC'ed, so this stack holds its raftMu and is blocked on (*Store).mu:

goroutine 214 [semacquire, 14 minutes]:
sync.runtime_Semacquire(0xc42013feac)
        /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*Mutex).Lock(0xc42013fea8)
        /usr/local/go/src/sync/mutex.go:85 +0xd0
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Lock(0xc42031b150)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:63 +0x30
github.com/cockroachdb/cockroach/storage.(*Store).removeReplicaImpl(0xc42031af00, 0xc42b964a80, 0x1e55, 0xc541706900, 0x35, 0x40, 0xc541706940, 0x35, 0x40, 0xc4f6ddf5f0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1750 +0x115
github.com/cockroachdb/cockroach/storage.(*Store).RemoveReplica(0xc42031af00, 0xc42b964a80, 0x1e55, 0xc541706900, 0x35, 0x40, 0xc541706940, 0x35, 0x40, 0xc4f6ddf5f0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1737 +0xea
github.com/cockroachdb/cockroach/storage.(*replicaGCQueue).process(0xc42013ffa0, 0x7f8acddb60a0, 0xc4a67a3620, 0x147aaa6aa0c96b59, 0x0, 0xc42b964a80, 0xc45e33d800, 0x17, 0x20, 0x10,
 ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica_gc_queue.go:192 +0x532
github.com/cockroachdb/cockroach/storage.(*baseQueue).processReplica(0xc420350460, 0xc42b964a80, 0xc42044e800, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/queue.go:531 +0x4a3
github.com/cockroachdb/cockroach/storage.(*baseQueue).processLoop.func1.2()
        /go/src/github.com/cockroachdb/cockroach/storage/queue.go:465 +0x58
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunTask(0xc420456090, 0xc549c8fe78, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:219 +0x100
github.com/cockroachdb/cockroach/storage.(*baseQueue).processLoop.func1()
        /go/src/github.com/cockroachdb/cockroach/storage/queue.go:469 +0x296
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc420456090, 0xc42084f080)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:191 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:192 +0x66

So the next question is who is blocking on raftMu while holding mu. Nobody should because of the prescribed lock ordering.

tbg commented 7 years ago

Total lie, that's a different Replica being GCed (the memory address looked similar).

tbg commented 7 years ago

This one does though (and blocks on the replica's mu).

github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Lock(0xc4218a4b20)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:63 +0x30
github.com/cockroachdb/cockroach/storage.(*Replica).applyRaftCommand(0xc4218a4a80, 0xc4252e14f0, 0x8, 0x7f8acde4a0b8, 0xc42abd4d20, 0x16480, 0x1438c, 0x1500000015, 0x2, 0x147aa4c7b027418f, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:2837 +0x579
github.com/cockroachdb/cockroach/storage.(*Replica).processRaftCommand(0xc4218a4a80, 0xc4252e14f0, 0x8, 0x16480, 0x92d, 0x1500000015, 0x2, 0x147aa4c7b027418f, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:2630 +0x3d3
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc4218a4a80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)

petermattis commented 7 years ago

Eh? I thought the newStoreRangeSet(s).Visit pattern was created to avoid holding Store.mu while iterating over replicas. The stack you posted seems fine. We're holding Store.mu while grabbing Replica.mu. That's kosher, right?

tbg commented 7 years ago

For posterity: goroutine dump is at gs://cockroach-test/omega-store-deadlock.txt.

tbg commented 7 years ago

@petermattis Ah, yes according to our lock order notes, but something smells there. I think we should avoid acquiring r.mu when holding s.mu but you're right, this doesn't seem inherently incorrect.

tbg commented 7 years ago

Ok, new theory: This goroutine actually does hold r.mu (of 0xc4218a4a80), and it's runnable while everything has been locked up for >16min. Is it possible that this code is never terminating (or very slow, i.e. creating a huge heap for some reason)?

goroutine 29434001 [runnable]:
github.com/cockroachdb/cockroach/storage.(*overlapHeap).Less(0xc42bbde350, 0x28439, 0x1421c, 0xc4c0f00a01)
        <autogenerated>:61
container/heap.down(0x2220800, 0xc42bbde350, 0x1421c, 0x42273)
        /usr/local/go/src/container/heap/heap.go:111 +0x7a
container/heap.Init(0x2220800, 0xc42bbde350)
        /usr/local/go/src/container/heap/heap.go:45 +0x7f
github.com/cockroachdb/cockroach/storage.(*overlapHeap).Init(0xc42bbde350, 0xc440180000, 0x42273, 0x48800)
        /go/src/github.com/cockroachdb/cockroach/storage/command_queue.go:338 +0x5b
github.com/cockroachdb/cockroach/storage.(*CommandQueue).getWait(0xc42bbde310, 0x0, 0xc42bdb1500, 0x102, 0x102, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/command_queue.go:204 +0x2a4
github.com/cockroachdb/cockroach/storage.(*Replica).beginCmds(0xc4218a4a80, 0x7f8acde4a0b8, 0xc48626c570, 0xc4317cf260, 0xc45e336ad8, 0x638405, 0x40)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1162 +0x2c8
github.com/cockroachdb/cockroach/storage.(*Replica).addWriteCmd(0xc4218a4a80, 0x7f8acde4a0b8, 0xc48626c570, 0x147aa61eb94d2d64, 0x0, 0xe0000000e, 0x4, 0x92d, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1502 +0x6d2
github.com/cockroachdb/cockroach/storage.(*Replica).Send(0xc4218a4a80, 0x7f8acde4a0b8, 0xc48626c570, 0x147aa61eb94d2d64, 0x0, 0xe0000000e, 0x4, 0x92d, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1074 +0x277
github.com/cockroachdb/cockroach/storage.(*Store).Send(0xc42031af00, 0x7f8acde4a0b8, 0xc431a73050, 0x147aa61eb94d2d64, 0x0, 0xe0000000e, 0x4, 0x92d, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2100 +0x752
github.com/cockroachdb/cockroach/storage.(*Stores).Send(0xc4203df5e0, 0x7f8acde4a0b8, 0xc431a72f90, 0x0, 0x0, 0xe0000000e, 0x4, 0x92d, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/stores.go:184 +0x202
github.com/cockroachdb/cockroach/server.(*Node).Batch.func3()
        /go/src/github.com/cockroachdb/cockroach/server/node.go:835 +0x2af
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunTask(0xc420456090, 0xc47402f8c8, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:219 +0x100
github.com/cockroachdb/cockroach/server.(*Node).Batch(0xc42018c000, 0x7f8acde4a0b8, 0xc431a72c30, 0xc4317d1ea0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/server/node.go:847 +0x2bf
github.com/cockroachdb/cockroach/roachpb._Internal_Batch_Handler(0x179b640, 0xc42018c000, 0x7f8acde4a0b8, 0xc431a72810, 0xc42dd349b0, 0x0, 0x0, 0x0, 0x4, 0x7f8acde1c180)
        /go/src/github.com/cockroachdb/cockroach/roachpb/api.pb.go:1537 +0x27d
google.golang.org/grpc.(*Server).processUnaryRPC(0xc420354820, 0x2224520, 0xc47859c1b0, 0xc47957a3c0, 0xc4203eba70, 0x21ea2d0, 0x0, 0x0, 0x0)
        /go/src/google.golang.org/grpc/server.go:611 +0xc50
google.golang.org/grpc.(*Server).handleStream(0xc420354820, 0x2224520, 0xc47859c1b0, 0xc47957a3c0, 0x0)
        /go/src/google.golang.org/grpc/server.go:769 +0x6b0
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc483db8df0, 0xc420354820, 0x2224520, 0xc47859c1b0, 0xc47957a3c0)
        /go/src/google.golang.org/grpc/server.go:422 +0xab
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /go/src/google.golang.org/grpc/server.go:423 +0xa3

tbg commented 7 years ago

cc @nvanbenschoten

tbg commented 7 years ago

Node crashed but voila, here we go again:

goroutine 289 [semacquire, 6 minutes]:
sync.runtime_Semacquire(0xc420319024)
        /usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*Mutex).Lock(0xc420319020)
        /usr/local/go/src/sync/mutex.go:85 +0xd0
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Lock(0xc420158b20)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:63 +0x30
github.com/cockroachdb/cockroach/storage.(*Replica).Desc(0xc420158a80, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:872 +0x46
github.com/cockroachdb/cockroach/storage.(*Replica).endKey(0xc420158a80, 0xc61d3fcf40, 0xc420909b40, 0xc420909b60)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:3504 +0x2b
github.com/cockroachdb/cockroach/storage.(*Replica).Less(0xc420158a80, 0x220f9a0, 0xc6340c3100, 0xc42020e700)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:3509 +0x2f
github.com/google/btree.(*node).iterate(0xc42013cc40, 0x1, 0x220f9a0, 0xc6340c3100, 0x0, 0x0, 0x1, 0xc420909d08, 0x2)
        /go/src/github.com/google/btree/btree.go:441 +0x49a
github.com/google/btree.(*node).iterate(0xc4202dd080, 0x1, 0x220f9a0, 0xc6340c3100, 0x0, 0x0, 0xc6340c0001, 0xc420909d08, 0x1681820)
        /go/src/github.com/google/btree/btree.go:445 +0x558
github.com/google/btree.(*BTree).AscendGreaterOrEqual(0xc42040a100, 0x220f9a0, 0xc6340c3100, 0xc420909d08)
        /go/src/github.com/google/btree/btree.go:633 +0x74
github.com/cockroachdb/cockroach/storage.(*Store).visitReplicasLocked(0xc420296000, 0xc42020e700, 0x9, 0x10, 0x2206ae0, 0x2, 0x2, 0xc420909de0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1343 +0x11e
github.com/cockroachdb/cockroach/storage.(*Store).LookupReplica(0xc420296000, 0xc42020e700, 0x9, 0x10, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1290 +0x11e
github.com/cockroachdb/cockroach/storage.(*Store).maybeGossipSystemConfig(0xc420296000, 0xc4204400c0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1118 +0x6e
github.com/cockroachdb/cockroach/storage.(*Store).startGossip.func2()
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1080 +0x20d
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc420414000, 0xc4208d75d0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:191 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:192 +0x66

tbg commented 7 years ago

This time, no heap is involved. I'm looking.

nvanbenschoten commented 7 years ago

Do you mind sharing the SHA of your build? My line numbers aren't lining up on master's HEAD.

BramGruneir commented 7 years ago

96a61d731c50bccfffcbb1e55786600f346a281a

tbg commented 7 years ago

Ok, maybe the culprit is simpler:

W161005 17:52:44.777104 333998 storage/engine/rocksdb.go:1028  batch [49/28990/1] commit took 14.764413422s (>500ms):
goroutine 333998 [running]:
runtime/debug.Stack(0x16a7b60, 0x221e060, 0xc420010270)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/storage/engine.(*rocksDBBatch).Commit(0xc435620a80, 0xc42fad8e10, 0x22234a0)
        /go/src/github.com/cockroachdb/cockroach/storage/engine/rocksdb.go:1028 +0x212
github.com/cockroachdb/cockroach/storage.(*Replica).applySnapshot(0xc421437880, 0x7f383a322550, 0xc42fad8e10, 0xa74baeb47dd04075, 0xd444890a41ab3196, 0x476, 0xc439c6c140, 0x35, 0x40, 0xc439c6c180, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica_raftstorage.go:659 +0xccb
github.com/cockroachdb/cockroach/storage.(*Store).processRaftRequest(0xc420316f00, 0x7f383a322550, 0xc42fad8e10, 0xc42e9c2758, 0xa74baeb47dd04075, 0xd444890a41ab3196, 0x476, 0xc439c6c140, 0x35, 0x40, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2538 +0x7ac
github.com/cockroachdb/cockroach/storage.(*Store).HandleSnapshot(0xc420316f00, 0xc42e9c2700, 0x22242e0, 0xc440d9a210, 0xc4227ab801, 0x1ee2a10)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2303 +0x4df
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftSnapshot.func1.1(0x22242e0, 0xc440d9a210, 0xc4203c2820, 0xc434696740, 0xc434696748)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:352 +0x14d
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftSnapshot.func1(0x7f383a322550, 0xc42fad8cf0)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:353 +0x45
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunAsyncTask.func1(0xc420399c20, 0x1ee75a1, 0x19, 0x162, 0x1960278, 0xc42fad8d80, 0x7f383a322550, 0xc42fad8cf0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:239 +0xc6
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunAsyncTask
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:240 +0x227
I161005 17:52:44.844811 333998 storage/replica_raftstorage.go:580  [n14,s14] [n14,s14,r1142:/Table/51/1/454{76160…-87362…}]: with replicaID [?], applied preemptive snapshot in 15.007s
W161005 17:52:44.844980 333998 util/syncutil/timedmutex.go:85  [n14,s14,r1142:/Table/51/1/454{76160…-87362…}] mutex held for 15.008085828s (>500ms):
goroutine 333998 [running]:
runtime/debug.Stack(0x2562ea0, 0x31e0f4aa, 0x2565c00)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Unlock(0xc4214378e8)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:84 +0xf7
github.com/cockroachdb/cockroach/storage.(*Store).processRaftRequest(0xc420316f00, 0x7f383a322550, 0xc42fad8e10, 0xc42e9c2758, 0xa74baeb47dd04075, 0xd444890a41ab3196, 0x476, 0xc439c6c140, 0x35, 0x40, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2544 +0x7ff
github.com/cockroachdb/cockroach/storage.(*Store).HandleSnapshot(0xc420316f00, 0xc42e9c2700, 0x22242e0, 0xc440d9a210, 0xc4227ab801, 0x1ee2a10)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2303 +0x4df
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftSnapshot.func1.1(0x22242e0, 0xc440d9a210, 0xc4203c2820, 0xc434696740, 0xc434696748)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:352 +0x14d
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftSnapshot.func1(0x7f383a322550, 0xc42fad8cf0)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:353 +0x45
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunAsyncTask.func1(0xc420399c20, 0x1ee75a1, 0x19, 0x162, 0x1960278, 0xc42fad8d80, 0x7f383a322550, 0xc42fad8cf0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:239 +0xc6
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunAsyncTask
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:240 +0x227

Many of these snapshots could perhaps almost permanently keep RocksDB throttled, having the system grind to a halt.

tbg commented 7 years ago

Here's some more on the frequency of that:

W161005 17:53:27.717191 1638 storage/engine/rocksdb.go:1028  batch [4/360/0] commit took 21.292276081s (>500ms):
W161005 17:53:27.717348 1637 storage/engine/rocksdb.go:1028  batch [4/382/0] commit took 21.29195922s (>500ms):
W161005 17:53:27.818226 2742 storage/engine/rocksdb.go:1028  batch [290177/19381688/0] commit took 20.842484783s (>500ms):
W161005 17:53:27.971950 336866 storage/engine/rocksdb.go:1028  batch [97/60538/1] commit took 17.630255446s (>500ms):
W161005 17:53:49.849022 1641 storage/engine/rocksdb.go:1028  batch [1/53/0] commit took 22.130650759s (>500ms):
W161005 17:53:49.849080 1637 storage/engine/rocksdb.go:1028  batch [4/432/0] commit took 22.131401541s (>500ms):
W161005 17:53:49.849766 1642 storage/engine/rocksdb.go:1028  batch [4/412/0] commit took 22.132728076s (>500ms):
W161005 17:53:49.850096 1643 storage/engine/rocksdb.go:1028  batch [4/406/0] commit took 22.133368285s (>500ms):
W161005 17:53:49.850424 1640 storage/engine/rocksdb.go:1028  batch [4/440/0] commit took 22.132987885s (>500ms):
W161005 17:53:49.850604 1644 storage/engine/rocksdb.go:1028  batch [4/343/0] commit took 22.133532083s (>500ms):
W161005 17:53:49.850966 1639 storage/engine/rocksdb.go:1028  batch [4/411/0] commit took 22.132276765s (>500ms):
W161005 17:53:49.850977 1638 storage/engine/rocksdb.go:1028  batch [4/402/0] commit took 22.13226374s (>500ms):
W161005 17:53:49.951415 2742 storage/engine/rocksdb.go:1028  batch [290216/19381111/0] commit took 21.609888885s (>500ms):
W161005 17:53:50.102063 338541 storage/engine/rocksdb.go:1028  batch [80/47824/1] commit took 16.479499041s (>500ms):
W161005 17:54:11.467651 1638 storage/engine/rocksdb.go:1028  batch [4/375/0] commit took 21.616242478s (>500ms):
W161005 17:54:11.467792 1640 storage/engine/rocksdb.go:1028  batch [4/381/0] commit took 21.617038774s (>500ms):
W161005 17:54:11.467861 1639 storage/engine/rocksdb.go:1028  batch [4/382/0] commit took 21.615985874s (>500ms):
W161005 17:54:11.468073 1641 storage/engine/rocksdb.go:1028  batch [36/20536/0] commit took 21.617858956s (>500ms):
W161005 17:54:11.468452 1644 storage/engine/rocksdb.go:1028  batch [4/407/0] commit took 21.616660864s (>500ms):
W161005 17:54:11.468949 1642 storage/engine/rocksdb.go:1028  batch [4/331/0] commit took 21.617676691s (>500ms):
W161005 17:54:11.469226 1637 storage/engine/rocksdb.go:1028  batch [4/373/0] commit took 21.617808481s (>500ms):
W161005 17:54:11.469256 1643 storage/engine/rocksdb.go:1028  batch [4/375/0] commit took 21.618672224s (>500ms):
W161005 17:54:11.571618 2742 storage/engine/rocksdb.go:1028  batch [282933/18893907/0] commit took 21.151448402s (>500ms):
W161005 17:54:11.739748 339962 storage/engine/rocksdb.go:1028  batch [25/12957/1] commit took 18.73995097s (>500ms):
W161005 17:54:33.550351 1637 storage/engine/rocksdb.go:1028  batch [4/432/0] commit took 22.080490497s (>500ms):
W161005 17:54:33.550455 1642 storage/engine/rocksdb.go:1028  batch [4/401/0] commit took 22.080923726s (>500ms):
W161005 17:54:33.550505 1644 storage/engine/rocksdb.go:1028  batch [4/412/0] commit took 22.081515587s (>500ms):
W161005 17:54:33.550594 1641 storage/engine/rocksdb.go:1028  batch [4/389/0] commit took 22.081514834s (>500ms):
W161005 17:54:33.550986 1638 storage/engine/rocksdb.go:1028  batch [4/361/0] commit took 22.081604474s (>500ms):
W161005 17:54:33.551087 1640 storage/engine/rocksdb.go:1028  batch [4/351/0] commit took 22.07966517s (>500ms):
W161005 17:54:33.551156 1643 storage/engine/rocksdb.go:1028  batch [4/332/2] commit took 22.080343462s (>500ms):
W161005 17:54:33.551770 1639 storage/engine/rocksdb.go:1028  batch [4/331/0] commit took 22.081280806s (>500ms):
W161005 17:54:33.645704 2742 storage/engine/rocksdb.go:1028  batch [295944/19762777/0] commit took 20.91186621s (>500ms):
W161005 17:54:33.802137 343861 storage/engine/rocksdb.go:1028  batch [39/20573/1] commit took 19.913277123s (>500ms):
W161005 17:54:56.122823 1637 storage/engine/rocksdb.go:1028  batch [4/376/0] commit took 22.569969381s (>500ms):
W161005 17:54:56.122913 1643 storage/engine/rocksdb.go:1028  batch [4/337/0] commit took 22.57111276s (>500ms):
W161005 17:54:56.123556 1638 storage/engine/rocksdb.go:1028  batch [4/432/0] commit took 22.572078618s (>500ms):
W161005 17:54:56.123859 1641 storage/engine/rocksdb.go:1028  batch [4/332/2] commit took 22.571618527s (>500ms):
W161005 17:54:56.124058 1639 storage/engine/rocksdb.go:1028  batch [4/369/0] commit took 22.570537481s (>500ms):
W161005 17:54:56.124096 1644 storage/engine/rocksdb.go:1028  batch [4/432/0] commit took 22.572803071s (>500ms):
W161005 17:54:56.124188 1642 storage/engine/rocksdb.go:1028  batch [4/368/0] commit took 22.570465303s (>500ms):
W161005 17:54:56.124380 1640 storage/engine/rocksdb.go:1028  batch [4/435/0] commit took 22.57285741s (>500ms):
W161005 17:54:56.215252 2742 storage/engine/rocksdb.go:1028  batch [298812/19960063/0] commit took 21.468060503s (>500ms):
W161005 17:54:56.383678 345504 storage/engine/rocksdb.go:1028  batch [56/32717/1] commit took 20.756832865s (>500ms):
W161005 17:55:18.496121 1641 storage/engine/rocksdb.go:1028  batch [4/338/0] commit took 22.370898741s (>500ms):
W161005 17:55:18.496211 1640 storage/engine/rocksdb.go:1028  batch [4/321/0] commit took 22.371495799s (>500ms):
W161005 17:55:18.496359 1639 storage/engine/rocksdb.go:1028  batch [4/400/0] commit took 22.371975868s (>500ms):
W161005 17:55:18.496646 1638 storage/engine/rocksdb.go:1028  batch [4/356/0] commit took 22.372805922s (>500ms):
W161005 17:55:18.497070 1643 storage/engine/rocksdb.go:1028  batch [4/359/0] commit took 22.372485433s (>500ms):
W161005 17:55:18.497131 1637 storage/engine/rocksdb.go:1028  batch [4/413/0] commit took 22.373402264s (>500ms):
W161005 17:55:18.497163 1642 storage/engine/rocksdb.go:1028  batch [4/396/0] commit took 22.371100286s (>500ms):
W161005 17:55:18.497560 1644 storage/engine/rocksdb.go:1028  batch [4/355/0] commit took 22.371155669s (>500ms):
W161005 17:55:18.598348 2742 storage/engine/rocksdb.go:1028  batch [307927/20565046/0] commit took 21.197645839s (>500ms):
W161005 17:55:18.739484 347080 storage/engine/rocksdb.go:1028  batch [18/7913/1] commit took 20.400394511s (>500ms):

tbg commented 7 years ago

Side note: Reserve calls deadReplicas every time, which is really expensive and helps lock up the store if some replicas hold their mutex for too long.

goroutine 347875 [running]:
runtime/debug.Stack(0x2562ea0, 0xc4254ca049, 0x2565c00)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Unlock(0xc420317150)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:84 +0xf7
github.com/cockroachdb/cockroach/storage.(*storeRangeSet).Visit(0xc423272860, 0xc423272840)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:240 +0x141
github.com/cockroachdb/cockroach/storage.(*Store).deadReplicas(0xc420316f00, 0xc4203c3428, 0xc4ffffffff, 0x0, 0x6cba89)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1923 +0xb9
github.com/cockroachdb/cockroach/storage.(*Store).Reserve(0xc420316f00, 0x7f383a322550, 0xc43afbcc90, 0xe0000000e, 0xb0000000b, 0x40b, 0x2a2803f, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:3240 +0x2f
github.com/cockroachdb/cockroach/storage.Server.Reserve.func1(0xc420316f00, 0xc40000000e, 0xc420316f00)
        /go/src/github.com/cockroachdb/cockroach/storage/stores_server.go:84 +0x68
github.com/cockroachdb/cockroach/storage.Server.execStoreCommand(0xc420446018, 0xc4203c3400, 0xe0000000e, 0xc4232729f8, 0xc4211523f0, 0xc42263ba40)
        /go/src/github.com/cockroachdb/cockroach/storage/stores_server.go:61 +0x15a
github.com/cockroachdb/cockroach/storage.Server.Reserve(0xc420446018, 0xc4203c3400, 0x7f383a322550, 0xc43afbcc90, 0xc43dd887c0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/stores_server.go:86 +0xd9
github.com/cockroachdb/cockroach/storage.(*Server).Reserve(0xc420285740, 0x7f383a322550, 0xc43afbcc90, 0xc43dd887c0, 0x0, 0x12, 0x12)
        <autogenerated>:460 +0x79
github.com/cockroachdb/cockroach/storage._Reservation_Reserve_Handler(0x172eca0, 0xc420285740, 0x7f383a322550, 0xc43afbcc90, 0xc440f2f680, 0x0, 0x0, 0x0, 0x4, 0x7f383a323a60)
        /go/src/github.com/cockroachdb/cockroach/storage/api.pb.go:267 +0x27d
google.golang.org/grpc.(*Server).processUnaryRPC(0xc420485d60, 0x2224520, 0xc4211523f0, 0xc42263ba40, 0xc4201e1140, 0x21eaa50, 0x0, 0x0, 0x0)
        /go/src/google.golang.org/grpc/server.go:611 +0xc50
google.golang.org/grpc.(*Server).handleStream(0xc420485d60, 0x2224520, 0xc4211523f0, 0xc42263ba40, 0x0)
        /go/src/google.golang.org/grpc/server.go:769 +0x6b0
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc421472340, 0xc420485d60, 0x2224520, 0xc4211523f0, 0xc42263ba40)
        /go/src/google.golang.org/grpc/server.go:422 +0xab
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /go/src/google.golang.org/grpc/server.go:423 +0xa3

petermattis commented 7 years ago

Great. Was waiting to see those. We should merge my PR.

On Wednesday, October 5, 2016, Tobias Schottdorf notifications@github.com wrote:

Here's some more on the frequency of that:

W161005 17:53:27.717191 1638 storage/engine/rocksdb.go:1028 batch [4/360/0] commit took 21.292276081s (>500ms): W161005 17:53:27.717348 1637 storage/engine/rocksdb.go:1028 batch [4/382/0] commit took 21.29195922s (>500ms): W161005 17:53:27.818226 2742 storage/engine/rocksdb.go:1028 batch [290177/19381688/0] commit took 20.842484783s (>500ms): W161005 17:53:27.971950 336866 storage/engine/rocksdb.go:1028 batch [97/60538/1] commit took 17.630255446s (>500ms): W161005 17:53:49.849022 1641 storage/engine/rocksdb.go:1028 batch [1/53/0] commit took 22.130650759s (>500ms): W161005 17:53:49.849080 1637 storage/engine/rocksdb.go:1028 batch [4/432/0] commit took 22.131401541s (>500ms): W161005 17:53:49.849766 1642 storage/engine/rocksdb.go:1028 batch [4/412/0] commit took 22.132728076s (>500ms): W161005 17:53:49.850096 1643 storage/engine/rocksdb.go:1028 batch [4/406/0] commit took 22.133368285s (>500ms): W161005 17:53:49.850424 1640 storage/engine/rocksdb.go:1028 batch [4/440/0] commit took 22.132987885s (>500ms): W161005 17:53:49.850604 1644 storage/engine/rocksdb.go:1028 batch [4/343/0] commit took 22.133532083s (>500ms): W161005 17:53:49.850966 1639 storage/engine/rocksdb.go:1028 batch [4/411/0] commit took 22.132276765s (>500ms): W161005 17:53:49.850977 1638 storage/engine/rocksdb.go:1028 batch [4/402/0] commit took 22.13226374s (>500ms): W161005 17:53:49.951415 2742 storage/engine/rocksdb.go:1028 batch [290216/19381111/0] commit took 21.609888885s (>500ms): W161005 17:53:50.102063 338541 storage/engine/rocksdb.go:1028 batch [80/47824/1] commit took 16.479499041s (>500ms): W161005 17:54:11.467651 1638 storage/engine/rocksdb.go:1028 batch [4/375/0] commit took 21.616242478s (>500ms): W161005 17:54:11.467792 1640 storage/engine/rocksdb.go:1028 batch [4/381/0] commit took 21.617038774s (>500ms): W161005 17:54:11.467861 1639 storage/engine/rocksdb.go:1028 batch [4/382/0] commit took 21.615985874s (>500ms): W161005 17:54:11.468073 1641 storage/engine/rocksdb.go:1028 batch [36/20536/0] commit took 21.617858956s (>500ms): W161005 17:54:11.468452 1644 storage/engine/rocksdb.go:1028 batch [4/407/0] commit took 21.616660864s (>500ms): W161005 17:54:11.468949 1642 storage/engine/rocksdb.go:1028 batch [4/331/0] commit took 21.617676691s (>500ms): W161005 17:54:11.469226 1637 storage/engine/rocksdb.go:1028 batch [4/373/0] commit took 21.617808481s (>500ms): W161005 17:54:11.469256 1643 storage/engine/rocksdb.go:1028 batch [4/375/0] commit took 21.618672224s (>500ms): W161005 17:54:11.571618 2742 storage/engine/rocksdb.go:1028 batch [282933/18893907/0] commit took 21.151448402s (>500ms): W161005 17:54:11.739748 339962 storage/engine/rocksdb.go:1028 batch [25/12957/1] commit took 18.73995097s (>500ms): W161005 17:54:33.550351 1637 storage/engine/rocksdb.go:1028 batch [4/432/0] commit took 22.080490497s (>500ms): W161005 17:54:33.550455 1642 storage/engine/rocksdb.go:1028 batch [4/401/0] commit took 22.080923726s (>500ms): W161005 17:54:33.550505 1644 storage/engine/rocksdb.go:1028 batch [4/412/0] commit took 22.081515587s (>500ms): W161005 17:54:33.550594 1641 storage/engine/rocksdb.go:1028 batch [4/389/0] commit took 22.081514834s (>500ms): W161005 17:54:33.550986 1638 storage/engine/rocksdb.go:1028 batch [4/361/0] commit took 22.081604474s (>500ms): W161005 17:54:33.551087 1640 storage/engine/rocksdb.go:1028 batch [4/351/0] commit took 22.07966517s (>500ms): W161005 17:54:33.551156 1643 storage/engine/rocksdb.go:1028 batch [4/332/2] commit took 22.080343462s (>500ms): W161005 17:54:33.551770 1639 storage/engine/rocksdb.go:1028 batch [4/331/0] commit took 22.081280806s (>500ms): W161005 17:54:33.645704 2742 storage/engine/rocksdb.go:1028 batch [295944/19762777/0] commit took 20.91186621s (>500ms): W161005 17:54:33.802137 343861 storage/engine/rocksdb.go:1028 batch [39/20573/1] commit took 19.913277123s (>500ms): W161005 17:54:56.122823 1637 storage/engine/rocksdb.go:1028 batch [4/376/0] commit took 22.569969381s (>500ms): W161005 17:54:56.122913 1643 storage/engine/rocksdb.go:1028 batch [4/337/0] commit took 22.57111276s (>500ms): W161005 17:54:56.123556 1638 storage/engine/rocksdb.go:1028 batch [4/432/0] commit took 22.572078618s (>500ms): W161005 17:54:56.123859 1641 storage/engine/rocksdb.go:1028 batch [4/332/2] commit took 22.571618527s (>500ms): W161005 17:54:56.124058 1639 storage/engine/rocksdb.go:1028 batch [4/369/0] commit took 22.570537481s (>500ms): W161005 17:54:56.124096 1644 storage/engine/rocksdb.go:1028 batch [4/432/0] commit took 22.572803071s (>500ms): W161005 17:54:56.124188 1642 storage/engine/rocksdb.go:1028 batch [4/368/0] commit took 22.570465303s (>500ms): W161005 17:54:56.124380 1640 storage/engine/rocksdb.go:1028 batch [4/435/0] commit took 22.57285741s (>500ms): W161005 17:54:56.215252 2742 storage/engine/rocksdb.go:1028 batch [298812/19960063/0] commit took 21.468060503s (>500ms): W161005 17:54:56.383678 345504 storage/engine/rocksdb.go:1028 batch [56/32717/1] commit took 20.756832865s (>500ms): W161005 17:55:18.496121 1641 storage/engine/rocksdb.go:1028 batch [4/338/0] commit took 22.370898741s (>500ms): W161005 17:55:18.496211 1640 storage/engine/rocksdb.go:1028 batch [4/321/0] commit took 22.371495799s (>500ms): W161005 17:55:18.496359 1639 storage/engine/rocksdb.go:1028 batch [4/400/0] commit took 22.371975868s (>500ms): W161005 17:55:18.496646 1638 storage/engine/rocksdb.go:1028 batch [4/356/0] commit took 22.372805922s (>500ms): W161005 17:55:18.497070 1643 storage/engine/rocksdb.go:1028 batch [4/359/0] commit took 22.372485433s (>500ms): W161005 17:55:18.497131 1637 storage/engine/rocksdb.go:1028 batch [4/413/0] commit took 22.373402264s (>500ms): W161005 17:55:18.497163 1642 storage/engine/rocksdb.go:1028 batch [4/396/0] commit took 22.371100286s (>500ms): W161005 17:55:18.497560 1644 storage/engine/rocksdb.go:1028 batch [4/355/0] commit took 22.371155669s (>500ms): W161005 17:55:18.598348 2742 storage/engine/rocksdb.go:1028 batch [307927/20565046/0] commit took 21.197645839s (>500ms): W161005 17:55:18.739484 347080 storage/engine/rocksdb.go:1028 batch [18/7913/1] commit took 20.400394511s (>500ms):

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/9749#issuecomment-251749762, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6f946sHqXol--4se27AKhiY_1esIaGks5qw-THgaJpZM4KO7rQ .

tbg commented 7 years ago

I agree. Care to do it or should I do it for you in case you're indisposed?

BramGruneir commented 7 years ago

Reserve calls deadReplicas every time, which is really expensive and helps lock up the store if some replicas hold their mutex for too long.

Can you file an issue for that? I don't mind fixing it.

petermattis commented 7 years ago

I'm indisposed. Go ahead.

On Wednesday, October 5, 2016, Tobias Schottdorf notifications@github.com wrote:

I agree. Care to do it or should I do it for you in case you're indisposed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/9749#issuecomment-251750663, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6f942DU_O6vBMsELynEviUMNjs4A7iks5qw-WEgaJpZM4KO7rQ .

andreimatei commented 7 years ago

I took a look with Tobi at an omega (100 node cluster) node that died. It had extremely long times spent doing CommandQueue operations - check out the beginCmds below (code snippet after the stack). @nvanbenschoten maybe you could add some introspection into the queue? Total number of elements, number of overlaps after one of these queries?

W161005 21:52:13.746037 2450420 util/syncutil/timedmutex.go:85  [n91,s91,r3314:/System/tsd/cr.node.exec.…] mutex held for 12.002381712s (>500ms):
goroutine 2450420 [running]:
runtime/debug.Stack(0x2569550, 0x2c4fb2f4, 0x256c2a0)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Unlock(0xc420164ea0)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:84 +0xf7
github.com/cockroachdb/cockroach/storage.(*Replica).beginCmds(0xc420164e00, 0x7f56fffb0028, 0xc6504bec90, 0xc65042bb90, 0xc48b342ad8, 0x6384a2, 0xc48b342b00)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1167 +0x36a
github.com/cockroachdb/cockroach/storage.(*Replica).addWriteCmd(0xc420164e00, 0x7f56fffb0028, 0xc6504bec90, 0x147abeb07dcb54f9, 0x0, 0x5b0000005b, 0x1, 0xcf2, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1511 +0x6d2
github.com/cockroachdb/cockroach/storage.(*Replica).Send(0xc420164e00, 0x7f56fffb0028, 0xc6504bec90, 0x147abeb07dcb54f9, 0x0, 0x5b0000005b, 0x1, 0xcf2, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1075 +0x277
github.com/cockroachdb/cockroach/storage.(*Store).Send(0xc4201c4c00, 0x7f56fffb0028, 0xc493a7fd10, 0x147abeb07dcb54f9, 0x0, 0x5b0000005b, 0x1, 0xcf2, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2108 +0x752
github.com/cockroachdb/cockroach/storage.(*Stores).Send(0xc4203a2550, 0x7f56fffb0028, 0xc493a7fcb0, 0x0, 0x0, 0x5b0000005b, 0x1, 0xcf2, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/stores.go:186 +0x202
github.com/cockroachdb/cockroach/server.(*Node).Batch.func3()
        /go/src/github.com/cockroachdb/cockroach/server/node.go:837 +0x2af
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunTask(0xc42042c900, 0xc64cc018c8, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:219 +0x100
github.com/cockroachdb/cockroach/server.(*Node).Batch(0xc4201c2240, 0x7f56fffb0028, 0xc493a7fc50, 0xc4924bd2d0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/server/node.go:849 +0x2bf
github.com/cockroachdb/cockroach/roachpb._Internal_Batch_Handler(0x179e1e0, 0xc4201c2240, 0x7f56fffb0028, 0xc493cdd410, 0xc493e910e0, 0x0, 0x0, 0x0, 0x7f56fffb9f28, 0x756ea12f8641d0)

        r.mu.Lock()
        chans := r.mu.cmdQ.getWait(readOnly, spans...)
        cmd = r.mu.cmdQ.add(readOnly, spans...)
        r.mu.Unlock()

@petermattis Here's a long commit for you. There were many of those right next to each other, and they all take 8.5s.

W161004 20:02:34.968481 298 storage/engine/rocksdb.go:1028  batch [2/727/0] commit took 8.52711979s (>500ms):
goroutine 298 [running]:
runtime/debug.Stack(0x16a7b60, 0x221e060, 0xc420010270)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/storage/engine.(*rocksDBBatch).Commit(0xc43b5ada40, 0x7f07372e2840, 0xc4ed18d2c0)
        /go/src/github.com/cockroachdb/cockroach/storage/engine/rocksdb.go:1028 +0x212
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc446789c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1901 +0x4b1
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReady(0xc446789c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1785 +0xcd
github.com/cockroachdb/cockroach/storage.(*Store).processReady(0xc4201cec00, 0x1ce7)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2783 +0x13b
github.com/cockroachdb/cockroach/storage.(*raftScheduler).worker(0xc4203d2620, 0xc420420090)
        /go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:193 +0x331
github.com/cockroachdb/cockroach/storage.(*raftScheduler).Start.func1()
        /go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:160 +0x33
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc420420090, 0xc42075e5c0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:191 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:192 +0x66

tamird commented 7 years ago

And this is with #9668?

tbg commented 7 years ago

Yes. Note how it's only ~8s, not ~21. Kind of circumstantial, but I'd read this as "problem has gotten better, but not disappeared".

tbg commented 7 years ago

@nvanbenschoten the above shows that we're spending a lot of time around the command queue. @andreimatei is going to look at some heap profiles, but unless that slow pace is just a side effect of things going horribly wrong elsewhere, there's something worth investigating here (at the very least, we're going to want to have metrics that can tell us about the happiness of the command queue in general). Any ideas what might be going on?

andreimatei commented 7 years ago

I looked at the heap of a node that was about to OOM (looks like there's a point where a node's mem usage starts going up and up and up and 15min later it's dead). The memory accounted for by this heap profile is only ~4G vs ~14G in the processes RSS. So there's probably blowout of whatever is happening in RocksDB. Is that expected, considering the below?

80% of the 4G is allocated in raftpb.Entry.Unmarshall. Profile (github doesn't let me attach svg): https://www.dropbox.com/s/ztynqnhu7767wei/pprof002.svg?dl=0

nvanbenschoten commented 7 years ago

@tschottdorf the time it takes the CommandQueue to determine the overlaps for a given command will grow roughly linearly with the number of overlapping commands*, so if command execution is taking ~9 seconds and new overlapping commands are coming in at a faster rate, I'd expect CommandQueue accesses to slow down in turn. That said, 12 seconds seems like a lot, so adding tracing into the CommandQueue would be helpful to get some better clarity of what's going on and where exactly this time is being spent.

It also might be the case that once the CommandQueue slows down, its long locking phases might slow down other entities that need access to replica.mu. To alleviate this, we might want to look into giving the command queue its own Mutex, which I believe would be safe.

(*) We currently have an optimization to stop looking at overlapping commands once it's impossible for any more of them to introduce dependencies. However, this optimization is only performed after all overlapping commands have been pulled out of the queue's interval tree and sorted by ID. Because my hunch is that the tree traversal is what's taking a while, it would be nice to extend this optimization. If we could somehow visit the overlapping commands in the CommandQueue's interval tree in order of ID, we could perform this optimization much earlier, potentially allowing us to skip a large portion of the tree traversal in cases of overlap blowup and avoid the linear complexity in the common case. That all said, I'm not sure this is possible with an interval tree, or at least not without some serious modification to its traversal approach.

tbg commented 7 years ago

@nvanbenschoten tracing isn't available for that large cluster (too big a firehose), at least not immediately. Do you think you can come up with metrics instead which would give us (or at least you) a good idea of how the command queue is faring? A slight difficulty is that we have that object on every *Replica, and we want to expose the worst Replica rather than some average (in the worst case, we could periodically poll all the command queues and keep the metrics as a gauge). If you could bite off that chunk of the investigation that would be great as there are various other things that go wrong.

tbg commented 7 years ago

I just watched the logs live stream for a minute or so. There's a steady tricky of "slow mutex" warnings (most <1s, some a few seconds or so), but no "commit took X" ones. That tells me that the RocksDB issue may not have disappeared, but at least isn't happening right now. Then again, the cluster currently seems relatively happy (given the circumstances), averaging around 40k ops/sec. However, we're consistently doing a bad job holding mutexes for only a short period of time (though many of those are likely applications of a snapshot under raftMu). I'll add better instrumentation here to distinguish the various mutexes.

tbg commented 7 years ago

I'm not sure how to read the svg @andreimatei. I gather that we're allocating a lot in unmarshaling of Raft entries (so we're not disposing of them quick enough, holding on to them somewhere)? What I don't understand is the "callstack" that leads to that box, especially the hop where the number grows from hundreds of MBs to 4G. Perhaps someone has a link/docs/answer.

tbg commented 7 years ago

More items for tmrw:

why are we showing 99/100 nodes? Where's the 100th?
we're currently going through the splits.
nodes have temporarily high spike in goroutine counts. I checked one out that had a lot of goroutines but did not have high memory usage - it was all DistSender RPCs waiting for a response (forever?)
nodes still die periodically. Graphana makes it look like it's always OOM, but not confirmed so far.

tbg commented 7 years ago

Picked up another recently crashed node's last memory profile, and it looks roughly the same as @andreimatei's earlier one, but this time it's on the incoming Raft message path:

heap.zip

These nodes die pretty rapidly when they do. We must be doing something pretty silly.

I also looked at a heap profile just before the allocation explosion and Entry unmarshaling was still there, but much subdued:

tbg commented 7 years ago

This is happening right now as 104.198.241.17:8080 dies, with high growth in both memory and goroutine (it doesn't really manage to report to grafana regularly any more).

W161006 07:17:51.968891 354023 util/syncutil/timedmutex.go:85  [n35] mutex held for 1m24.906100676s (>100ms):
goroutine 354023 [running]:
runtime/debug.Stack(0x2569550, 0xc403bc9605, 0x256c2a0)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/util/syncutil.(*TimedMutex).Unlock(0xc4202e9150)
        /go/src/github.com/cockroachdb/cockroach/util/syncutil/timedmutex.go:84 +0xf7
github.com/cockroachdb/cockroach/storage.(*storeRangeSet).Visit(0xc5df865848, 0xc5df865828)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:244 +0x14a
github.com/cockroachdb/cockroach/storage.(*Store).deadReplicas(0xc4202e8f00, 0xc420411568, 0xc4ffffffff, 0x0, 0x6cba69)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1929 +0xae
github.com/cockroachdb/cockroach/storage.(*Store).Reserve(0xc4202e8f00, 0x7fb8c8b9f800, 0xc4fd5c3d10, 0x2300000023, 0x200000002, 0x2c87, 0x2399895, 0x0)

I prepared https://github.com/cockroachdb/cockroach/pull/9771 which should remove that particular badness. We'll see what sticks.

If we're essentially never letting go of the store mutex, perhaps we're soaking up more and more Raft traffic but not actually delegating it to the Replicas? I'll have to check the code.

tbg commented 7 years ago

@nvanbenschoten take a look at this svg: cmdq.zip

It's still a bit hard to reason about cause and effect here, but I suspect the following causality:

command queue gets really slow for some reason
(at least some) replica locks are held for obscene amounts of time
the shitty visitor pattern infects the store mutex with the same disease (until #9771)
at this point no badness surprises me, but perhaps we're soaking up memory ad infinitum from Raft messages and perish

Note that I didn't see Batch.Commit take long in this last observation. It still happens occasionally though, but doesn't seem to be at the heart of these issues (?).

nvanbenschoten commented 7 years ago

@tschottdorf that matches my expectation for where time would be spent in CommandQueue if it grew to a very large size. Notice the min-heap init taking 30% of the time. This operation's overhead will scale linearly with the number of overlapping commands, so there are clearly a lot of these. Do you happen to have a corresponding heap dump so we can try to better gauge the total size of the queue?

EDIT: never mind about the heap profile, found yours from above.

tbg commented 7 years ago

I have a super half baked theory about why we're seeing these issues here. This is a large cluster, and so it does have one obvious hotspot: time series. If we don't split and rebalance away the time series replicas extremely efficiently (we don't), there would be ranges that get completely overwhelmed by 100 nodes trying to shove a bunch of data into them every 10 seconds.

That means holding raftMu for long periods of times, it means the command queue overflowing, etc., and we have (or at least have had, until #9771 but I'm sure there are more) let that get to the Store mutex which is extremely central to a smooth operation.

I got thinking about this looking at a local node with strict mutex warnings, and noticing that the periodic time series query takes >1ms to apply on my local single node cluster without load.

andreimatei commented 7 years ago

we've seen the r.mu held long on the ts range... Even the stack we saw above is about that range https://github.com/cockroachdb/cockroach/issues/9749#issuecomment-251813340

tbg commented 7 years ago

Let's run with COCKROACH_METRICS_SAMPLE_INTERVAL=2h. Can you roll that out ASAP? We'll still get grafana.

petermattis commented 7 years ago

Heh, I was just adding a note that we should turn down the metrics collection time.

spencerkimball commented 7 years ago

Good hypothesis. Maybe try setting stats interval higher and push a binary that staggers reporting times to verify.

On Thursday, October 6, 2016, Andrei Matei notifications@github.com wrote:

we've seen the r.mu held long on the ts range... Even the stack we saw above is about that range #9749 (comment) https://github.com/cockroachdb/cockroach/issues/9749#issuecomment-251813340

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/9749#issuecomment-252086041, or mute the thread https://github.com/notifications/unsubscribe-auth/AF3MTQzTNeJ2dCsRiOFGh8LGzSvgnEl9ks5qxWB2gaJpZM4KO7rQ .

andreimatei commented 7 years ago

I've added COCKROACH_METRICS_SAMPLE_INTERVAL=2h and restarted everything. Let's see how it goes.

We've discussed this a bit offline - if the problem was this timeseries ranges, even without increasing the sampling interval, every time we restart the cluster the leases of those ranges would get scattered around, so the problem might not be easy to repro without a tool that lets us colocate leases.

petermattis commented 7 years ago

Do we know how many timeseries ranges there are on omega? We currently track 282 metrics per node, so we're talking about a burst of 28k keys being written every 10 sec. I could see that overwhelming a single or small number of ranges. We could experiment with using ./cockroach debug range split to create more timeseries ranges (that will happen naturally as timeseries data in the cluster grows).

Timeseries data (i.e. the 282 metrics) is currently written in a single batch. It seems feasible to break those writes into smaller pieces.

tbg commented 7 years ago

I'm taking a look via ./cockroach debug range ls --host localhost --ca-cert=certs/ca.crt --cert=certs/node.crt --key=certs/node.key --max-results 99999.

tbg commented 7 years ago

$ grep tsd ranges
[...]
/System/tsd/cr.store.queue.replicagc.processingnanos/33/10s/2016-10-03T22:00:00Z-/System/tsd/cr.store.queue.replicate.purgatory/70/10s/2016-10-04T10:00:00Z [1786]
/System/tsd/cr.store.queue.replicate.purgatory/70/10s/2016-10-04T10:00:00Z-/System/tsd/cr.store.queue.split.process.failure/98/10s/2016-10-03T21:00:00Z [19645]
/System/tsd/cr.store.queue.split.process.failure/98/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.queue.split.processingnanos/57/10s/2016-10-04T10:00:00Z [11472]
/System/tsd/cr.store.queue.split.processingnanos/57/10s/2016-10-04T10:00:00Z-/System/tsd/cr.store.raft.process.workingnanos/17/10s/2016-10-03T21:00:00Z [7797]
/System/tsd/cr.store.raft.process.workingnanos/17/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.raft.process.workingnanos/18/10s/2016-10-03T21:00:00Z [4832]
/System/tsd/cr.store.raft.process.workingnanos/18/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.raft.rcvd.heartbeatresp/57/10s/2016-10-04T13:00:00Z [1781]
/System/tsd/cr.store.raft.rcvd.heartbeatresp/57/10s/2016-10-04T13:00:00Z-/System/tsd/cr.store.raft.rcvd.snap/59/10s/2016-10-04T03:00:00Z [20391]
/System/tsd/cr.store.raft.rcvd.snap/59/10s/2016-10-04T03:00:00Z-/System/tsd/cr.store.raft.rcvd.transferleader/91/10s/2016-10-04T16:00:00Z [11371]
/System/tsd/cr.store.raft.rcvd.transferleader/91/10s/2016-10-04T16:00:00Z-/System/tsd/cr.store.raft.ticks/69/10s/2016-10-03T21:00:00Z [7190]
/System/tsd/cr.store.raft.ticks/69/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.raft.ticks/7/10s/2016-10-03T21:00:00Z [3629]
/System/tsd/cr.store.raft.ticks/7/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.raft.ticks/70/10s/2016-10-03T21:00:00Z [1214]
/System/tsd/cr.store.raft.ticks/70/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.range.snapshots.preemptive-applied/13/10s/2016-10-03T23:00:00Z [24]
/System/tsd/cr.store.range.snapshots.preemptive-applied/13/10s/2016-10-03T23:00:00Z-/System/tsd/cr.store.range.splits/20/10s/2016-10-06T01:00:00Z [19150]
/System/tsd/cr.store.range.splits/20/10s/2016-10-06T01:00:00Z-/System/tsd/cr.store.ranges.allocator.remove/90/10s/2016-10-04T08:00:00Z [11483]
/System/tsd/cr.store.ranges.allocator.remove/90/10s/2016-10-04T08:00:00Z-/System/tsd/cr.store.replicas/57/10s/2016-10-04T12:00:00Z [8273]
/System/tsd/cr.store.replicas/57/10s/2016-10-04T12:00:00Z-/System/tsd/cr.store.replicas.reserved/1/10s/2016-10-03T20:00:00Z [2349]
/System/tsd/cr.store.replicas.reserved/1/10s/2016-10-03T20:00:00Z-/System/tsd/cr.store.replicas.reserved/1/10s/2016-10-03T21:00:00Z [19560]
/System/tsd/cr.store.replicas.reserved/1/10s/2016-10-03T21:00:00Z-/System/tsd/cr.store.rocksdb.bloom.filter.prefix.checked/81/10s/2016-10-04T04:00:00Z [2348]
/System/tsd/cr.store.rocksdb.bloom.filter.prefix.checked/81/10s/2016-10-04T04:00:00Z-/System/tsd/cr.store.rocksdb.bloom.filter.prefix.checked/82/10s/2016-10-04T04:00:00Z [19740]
/System/tsd/cr.store.rocksdb.bloom.filter.prefix.checked/82/10s/2016-10-04T04:00:00Z-/System/tsd/cr.store.rocksdb.memtable.total-size/3/10s/2016-10-04T13:00:00Z [2347]
/System/tsd/cr.store.rocksdb.memtable.total-size/3/10s/2016-10-04T13:00:00Z-/System/tsd/cr.store.rocksdb.read-amplification/39/10s/2016-10-03T22:00:00Z [19504]
/System/tsd/cr.store.rocksdb.read-amplification/39/10s/2016-10-03T22:00:00Z-/System/tsd/cr.store.rocksdb.read-amplification/40/10s/2016-10-03T22:00:00Z [9713]
/System/tsd/cr.store.rocksdb.read-amplification/40/10s/2016-10-03T22:00:00Z-/System/tsd/cr.store.valcount/94/10s/2016-10-04T01:00:00Z [4845]
/System/tsd/cr.store.valcount/94/10s/2016-10-04T01:00:00Z-/System/tsd/cr.store.valcount/96/10s/2016-10-04T01:00:00Z [20341]
/System/tsd/cr.store.valcount/96/10s/2016-10-04T01:00:00Z-/System/"update-cluster" [2193]
cockroach@cockroach-omega-090:~$ cat ranges | grep tsd | wc -l
98

tbg commented 7 years ago

I'm still seeing the occasional command queue stall on omega, though the waters are currently relatively quiet and I don't blame the command queue (or command application) to take a while given these hard core batches we're stuffing into them.

petermattis commented 7 years ago

Well, that seems like a sufficient number of timeseries ranges. 28k / 98 == 287 writes/range

tbg commented 7 years ago

Additionally, this is now that the cluster is actually doing better. Looking last night, I would expect that number to be smaller (we could find out if we really cared).

Btw, gossip forwarding is somewhat unhappy. A 100 node cluster should reach a steady state where connections aren't thrashed as much, but they are. I don't think it interferes with operations much, but it's awkward.

104.197.7.85: I161006 21:40:32.818083 1505518 gossip/server.go:254  [n29] node 29: refusing gossip from node 40 (max 4 conns); forwarding to 72 ({tcp cockroach-omega-072:26257})
104.197.7.85: I161006 21:40:32.822056 1505518 gossip/server.go:254  [n29] node 29: refusing gossip from node 40 (max 4 conns); forwarding to 93 ({tcp cockroach-omega-093:26257})
104.197.7.85: I161006 21:40:32.822703 1505518 gossip/server.go:254  [n29] node 29: refusing gossip from node 40 (max 4 conns); forwarding to 72 ({tcp cockroach-omega-072:26257})
104.197.140.88: I161006 21:40:32.885913 1743625 gossip/server.go:254  [n12] node 12: refusing gossip from node 40 (max 4 conns); forwarding to 42 ({tcp cockroach-omega-042:26257})
104.197.140.88: I161006 21:40:32.903174 1743625 gossip/server.go:254  [n12] node 12: refusing gossip from node 40 (max 4 conns); forwarding to 7 ({tcp cockroach-omega-007:26257})
104.198.242.162: I161006 21:40:32.970412 1022050 gossip/server.go:254  [n5] node 5: refusing gossip from node 40 (max 4 conns); forwarding to 24 ({tcp cockroach-omega-024:26257})
104.198.242.162: I161006 21:40:32.973392 1022050 gossip/server.go:254  [n5] node 5: refusing gossip from node 40 (max 4 conns); forwarding to 24 ({tcp cockroach-omega-024:26257})
104.198.242.162: I161006 21:40:32.974201 1022050 gossip/server.go:254  [n5] node 5: refusing gossip from node 40 (max 4 conns); forwarding to 24 ({tcp cockroach-omega-024:26257})
130.211.189.253: I161006 21:40:32.977045 1474501 gossip/server.go:254  [n24] node 24: refusing gossip from node 40 (max 4 conns); forwarding to 51 ({tcp cockroach-omega-051:26257})
130.211.189.253: I161006 21:40:32.979703 1474501 gossip/server.go:254  [n24] node 24: refusing gossip from node 40 (max 4 conns); forwarding to 34 ({tcp cockroach-omega-034:26257})
130.211.189.253: I161006 21:40:32.981100 1474501 gossip/server.go:254  [n24] node 24: refusing gossip from node 40 (max 4 conns); forwarding to 33 ({tcp cockroach-omega-033:26257})
104.155.137.78: I161006 21:40:33.861200 1894980 gossip/server.go:254  [n15] node 15: refusing gossip from node 69 (max 4 conns); forwarding to 65 ({tcp cockroach-omega-065:26257})
104.155.137.78: I161006 21:40:33.868766 1894980 gossip/server.go:254  [n15] node 15: refusing gossip from node 69 (max 4 conns); forwarding to 12 ({tcp cockroach-omega-012:26257})
104.198.223.39: I161006 21:40:34.595144 1645601 gossip/server.go:254  [n23] node 23: refusing gossip from node 69 (max 4 conns); forwarding to 85 ({tcp cockroach-omega-085:26257})
104.198.223.39: I161006 21:40:34.599871 1645601 gossip/server.go:254  [n23] node 23: refusing gossip from node 69 (max 4 conns); forwarding to 22 ({tcp cockroach-omega-022:26257})
104.198.223.39: I161006 21:40:34.631990 1645601 gossip/server.go:254  [n23] node 23: refusing gossip from node 69 (max 4 conns); forwarding to 89 ({tcp cockroach-omega-089:26257})
104.198.223.39: I161006 21:40:34.633451 1645601 gossip/server.go:254  [n23] node 23: refusing gossip from node 69 (max 4 conns); forwarding to 89 ({tcp cockroach-omega-089:26257})
104.198.223.39: I161006 21:40:34.643484 1645601 gossip/server.go:254  [n23] node 23: refusing gossip from node 69 (max 4 conns); forwarding to 89 ({tcp cockroach-omega-089:26257})
104.155.137.78: I161006 21:40:34.653518 1895834 gossip/server.go:254  [n15] node 15: refusing gossip from node 69 (max 4 conns); forwarding to 37 ({tcp cockroach-omega-037:26257})
104.198.223.39: I161006 21:40:34.656862 1645601 gossip/server.go:254  [n23] node 23: refusing gossip from node 69 (max 4 conns); forwarding to 89 ({tcp cockroach-omega-089:26257})
104.155.137.78: I161006 21:40:34.664758 1895834 gossip/server.go:254  [n15] node 15: refusing gossip from node 69 (max 4 conns); forwarding to 12 ({tcp cockroach-omega-012:26257})
104.155.137.78: I161006 21:40:34.667654 1895834 gossip/server.go:254  [n15] node 15: refusing gossip from node 69 (max 4 conns); forwarding to 12 ({tcp cockroach-omega-012:26257})
104.155.137.78: I161006 21:40:34.722797 1895834 gossip/server.go:254  [n15] node 15: refusing gossip from node 69 (max 4 conns); forwarding to 65 ({tcp cockroach-omega-065:26257})
104.198.255.99: I161006 21:40:34.912909 1384896 gossip/server.go:254  [n26] node 26: refusing gossip from node 69 (max 4 conns); forwarding to 79 ({tcp cockroach-omega-079:26257})
104.198.33.57: I161006 21:40:34.986429 1378355 gossip/server.go:254  [n9] node 9: refusing gossip from node 69 (max 4 conns); forwarding to 66 ({tcp cockroach-omega-066:26257})
104.198.33.57: I161006 21:40:35.001922 1378355 gossip/server.go:254  [n9] node 9: refusing gossip from node 69 (max 4 conns); forwarding to 66 ({tcp cockroach-omega-066:26257})
104.198.33.57: I161006 21:40:35.010396 1378355 gossip/server.go:254  [n9] node 9: refusing gossip from node 69 (max 4 conns); forwarding to 61 ({tcp cockroach-omega-061:26257})
104.198.40.234: I161006 21:40:35.174897 1202176 gossip/server.go:254  [n30] node 30: refusing gossip from node 69 (max 4 conns); forwarding to 61 ({tcp cockroach-omega-061:26257})
104.198.40.234: I161006 21:40:35.184023 1202176 gossip/server.go:254  [n30] node 30: refusing gossip from node 69 (max 4 conns); forwarding to 88 ({tcp cockroach-omega-088:26257})
104.197.143.153: I161006 21:40:36.075797 2002615 gossip/server.go:254  [n11] node 11: refusing gossip from node 75 (max 4 conns); forwarding to 3 ({tcp cockroach-omega-003:26257})
104.198.241.226: I161006 21:40:36.131057 1318331 gossip/server.go:254  [n4] node 4: refusing gossip from node 75 (max 4 conns); forwarding to 10 ({tcp cockroach-omega-010:26257})
104.198.242.162: I161006 21:40:36.149840 1024407 gossip/server.go:254  [n5] node 5: refusing gossip from node 75 (max 4 conns); forwarding to 24 ({tcp cockroach-omega-024:26257})
130.211.189.253: I161006 21:40:36.182970 1478389 gossip/server.go:254  [n24] node 24: refusing gossip from node 75 (max 4 conns); forwarding to 51 ({tcp cockroach-omega-051:26257})
104.197.143.153: I161006 21:40:36.191283 2002615 gossip/server.go:254  [n11] node 11: refusing gossip from node 75 (max 4 conns); forwarding to 44 ({tcp cockroach-omega-044:26257})
104.197.143.153: I161006 21:40:36.200922 2002615 gossip/server.go:254  [n11] node 11: refusing gossip from node 75 (max 4 conns); forwarding to 52 ({tcp cockroach-omega-052:26257})
104.154.87.20: I161006 21:40:36.268781 1458432 gossip/server.go:254  [n14] node 14: refusing gossip from node 75 (max 4 conns); forwarding to 99 ({tcp cockroach-omega-099:26257})
104.198.57.154: I161006 21:40:36.306849 1662653 gossip/server.go:254  [n2] node 2: refusing gossip from node 75 (max 4 conns); forwarding to 36 ({tcp cockroach-omega-036:26257})
104.198.63.222: I161006 21:40:36.318390 1798429 gossip/server.go:254  [n3] node 3: refusing gossip from node 75 (max 4 conns); forwarding to 63 ({tcp cockro

tbg commented 7 years ago

We're running very smoothly at 51k inserts/sec.

221 Raft apps/sec. I wonder why that isn't closer to 51k. We're probably counting more in there than I think we do. Definitely a fair share of that are lease renewals. But not 75%?

andreimatei commented 7 years ago

Maybe this is obvious to most people here, but the timeseries are almost append-only. The keys are [system key prefix]tsd[series name][resolution][time slot][source key]. So splits, for the purposes of the keeping the command queue small) only help when they happen to split into different series, and rebalancing, for the purposes of node load (if we're able to do it) only helps up to the number of series.

spencerkimball commented 7 years ago

We should probably just pre split on every time series name.

On Thursday, October 6, 2016, Andrei Matei notifications@github.com wrote:

Maybe this is obvious to most people here, but the timeseries are almost append-only. The keys are [system key prefix]tsd[series name][resolution][time slot][source key]. So splits, for the purposes of the keeping the command queue small) only help when they happen to split into different series, and rebalancing, for the purposes of node load (if we're able to do it) only helps up to the number of series.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/9749#issuecomment-252102493, or mute the thread https://github.com/notifications/unsubscribe-auth/AF3MTXyVLyNlOxaBfOa4oLBRDK3_hOTGks5qxXEtgaJpZM4KO7rQ .