camunda / camunda

Process Orchestration Framework
https://camunda.com/platform/
3.33k stars 605 forks source link

SIGSEG on Exporter clearState #3256

Closed Zelldon closed 5 years ago

Zelldon commented 5 years ago

Describe the bug

When new partition is installed and no exporter are configured then the ExporterState is cleaned up. It can happen that concurrently to that the partition is closed again, because other node becomes leader. If this happens then the db is closed and it might happen that the the ServiceController->ExporterManagerService->ClearState still wants to access the database, THEN we get an segmentation fault.

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.rocksdb.Transaction.getIterator(JJJ)J+0
j  org.rocksdb.Transaction.getIterator(Lorg/rocksdb/ReadOptions;Lorg/rocksdb/ColumnFamilyHandle;)Lorg/rocksdb/RocksIterator;+42
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.newIterator(Lorg/rocksdb/ReadOptions;Lorg/rocksdb/ColumnFamilyHandle;)Lorg/rocksdb/RocksIterator;+6
j  io.zeebe.db.impl.rocksdb.transaction.DefaultDbContext.newIterator(Lorg/rocksdb/ReadOptions;Lorg/rocksdb/ColumnFamilyHandle;)Lorg/rocksdb/RocksIterator;+6
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.newIterator(JLio/zeebe/db/DbContext;Lorg/rocksdb/ReadOptions;)Lorg/rocksdb/RocksIterator;+18
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.lambda$foreach$7(JLio/zeebe/db/DbContext;Ljava/util/function/BiConsumer;Lio/zeebe/db/impl/rocksdb/transaction/ZeebeTransaction;)V+7
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb$$Lambda$897.run(Lio/zeebe/db/impl/rocksdb/transaction/ZeebeTransaction;)V+17
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.lambda$ensureInOpenTransaction$1(Lio/zeebe/db/impl/rocksdb/transaction/ZeebeTransactionDb$TransactionConsumer;Lio/zeebe/db/DbContext;)V+10
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb$$Lambda$899.run()V+8
j  io.zeebe.db.impl.rocksdb.transaction.DefaultDbContext.runInNewTransaction(Lio/zeebe/db/TransactionOperation;)V+8
j  io.zeebe.db.impl.rocksdb.transaction.DefaultDbContext.runInTransaction(Lio/zeebe/db/TransactionOperation;)V+21
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.ensureInOpenTransaction(Lio/zeebe/db/DbContext;Lio/zeebe/db/impl/rocksdb/transaction/ZeebeTransactionDb$TransactionConsumer;)V+8
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.foreach(JLio/zeebe/db/DbContext;Ljava/util/function/BiConsumer;)V+12
j  io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.foreach(JLio/zeebe/db/DbContext;Lio/zeebe/db/DbKey;Lio/zeebe/db/DbValue;Ljava/util/function/BiConsumer;)V+14
j  io.zeebe.db.impl.rocksdb.transaction.TransactionalColumnFamily.forEach(Lio/zeebe/db/DbContext;Ljava/util/function/BiConsumer;)V+18
j  io.zeebe.db.impl.rocksdb.transaction.TransactionalColumnFamily.forEach(Ljava/util/function/BiConsumer;)V+6
j  io.zeebe.broker.exporter.stream.ExportersState.visitPositions(Ljava/util/function/BiConsumer;)V+10
j  io.zeebe.broker.exporter.ExporterManagerService.clearExporterState(Lio/zeebe/db/ZeebeDb;)V+22
j  io.zeebe.broker.exporter.ExporterManagerService.startExporter(Lio/zeebe/servicecontainer/ServiceName;Lio/zeebe/broker/clustering/base/partitions/Partition;)V+25
j  io.zeebe.broker.exporter.ExporterManagerService$$Lambda$186.accept(Ljava/lang/Object;Ljava/lang/Object;)V+12
j  io.zeebe.servicecontainer.impl.ServiceController.invoke(Ljava/util/function/BiConsumer;Lio/zeebe/servicecontainer/ServiceName;Ljava/lang/Object;)V+3
j  io.zeebe.servicecontainer.impl.ServiceController.lambda$addReferencedValue$1(Lio/zeebe/servicecontainer/ServiceGroupReference;Lio/zeebe/servicecontainer/ServiceName;Ljava/lang/Object;)V+6
j  io.zeebe.servicecontainer.impl.ServiceController$$Lambda$788.run()V+12
j  io.zeebe.util.sched.ActorControl.lambda$call$0(Ljava/lang/Runnable;)Ljava/lang/Void;+1
j  io.zeebe.util.sched.ActorControl$$Lambda$344.call()Ljava/lang/Object;+4
J 7332 c2 io.zeebe.util.sched.ActorJob.execute(Lio/zeebe/util/sched/ActorThread;)V (203 bytes) @ 0x00007f48b6a2f2a0 [0x00007f48b6a2ed60+0x0000000000000540]
J 3810 c1 io.zeebe.util.sched.ActorTask.execute(Lio/zeebe/util/sched/ActorThread;)Z (191 bytes) @ 0x00007f48af4067dc [0x00007f48af4065c0+0x000000000000021c]
J 7268 c1 io.zeebe.util.sched.ActorThread.executeCurrentTask()V (101 bytes) @ 0x00007f48afbffe74 [0x00007f48afbffc80+0x00000000000001f4]
j  io.zeebe.util.sched.ActorThread.doWork()V+57
J 2876% c2 io.zeebe.util.sched.ActorThread.run()V (49 bytes) @ 0x00007f48b699b3c8 [0x00007f48b699b340+0x0000000000000088]
v  ~StubRoutines::call_stub
[zeebe-2.log](https://github.com/zeebe-io/zeebe/files/3770759/zeebe-2.log)
[zeebe-2.log.txt](https://github.com/zeebe-io/zeebe/files/3770761/zeebe-2.log.txt)
```
I 2019-10-24T18:12:06.650025060Z zeebe-2 2019-10-24 18:12:06.649 [] [raft-server-raft-atomix-partition-2] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-2} - Found leader 0
  zeebe-2
I 2019-10-24T18:12:07.495265871Z zeebe-1 2019-10-24 18:12:07.494 [] [raft-server-raft-atomix-partition-2] INFO  io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-2}{role=LEADER} - Received greater term from 0
  zeebe-1
I 2019-10-24T18:12:07.495395850Z zeebe-1 2019-10-24 18:12:07.495 [] [raft-server-raft-atomix-partition-2] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-2} - Transitioning to FOLLOWER
  zeebe-1
I 2019-10-24T18:12:07.499085668Z zeebe-1 2019-10-24 18:12:07.498 [io.zeebe.broker.clustering.base.partitions.PartitionLeaderElection] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-3] DEBUG io.zeebe.broker.clustering - Partition role transitioning from LEADER to FOLLOWER
  zeebe-1
I 2019-10-24T18:12:07.499143559Z zeebe-1 2019-10-24 18:12:07.498 [] [raft-server-raft-atomix-partition-2] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-2} - Found leader 0
  zeebe-1
I 2019-10-24T18:12:11.652917383Z zeebe-0 2019-10-24 18:12:11.652 [] [raft-server-raft-atomix-partition-2] INFO  io.atomix.protocols.raft.roles.LeaderAppender - RaftServer{raft-atomix-partition-2} - ConfigureRequest{term=7, leader=0, index=0, timestamp=1571939855667, members=[DefaultRaftMember{id=2, type=ACTIVE, updated=2019-10-24T17:57:35.667833Z}, DefaultRaftMember{id=1, type=ACTIVE, updated=2019-10-24T17:57:35.667833Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2019-10-24T17:57:35.667833Z}]} to 1 failed: java.util.concurrent.TimeoutException: Request type raft-atomix-partition-2-configure timed out in 5000 milliseconds
  zeebe-0
I 2019-10-24T18:12:11.652973409Z zeebe-0 2019-10-24 18:12:11.652 [] [raft-server-raft-atomix-partition-2] INFO  io.atomix.protocols.raft.roles.LeaderAppender - RaftServer{raft-atomix-partition-2} - AppendRequest{term=7, leader=0, prevLogIndex=1462576, prevLogTerm=6, entries=0, commitIndex=1462574} to 1 failed: java.util.concurrent.TimeoutException: Request type raft-atomix-partition-2-append timed out in 5000 milliseconds
  zeebe-0
I 2019-10-24T18:12:16.280717577Z zeebe-0 2019-10-24 18:12:16.280 [] [raft-server-raft-atomix-partition-2-state] INFO  io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-2 - Backup log raft-atomix-partition-2 at position 249114714592
  zeebe-0
I 2019-10-24T18:12:17.509678348Z zeebe-1 2019-10-24 18:12:17.509 [] [raft-server-raft-atomix-partition-2-state] INFO  io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-2 - Backup log raft-atomix-partition-2 at position 249114714592
  zeebe-1
I 2019-10-24T18:12:22.072434649Z zeebe-1 2019-10-24 18:12:22.071 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{raft-atomix-partition-1}{role=FOLLOWER} - No heartbeat from null in the last PT3.171S (calculated from last 3581 ms)
  zeebe-1
I 2019-10-24T18:12:22.074084706Z zeebe-1 2019-10-24 18:12:22.073 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Transitioning to CANDIDATE
  zeebe-1
I 2019-10-24T18:12:22.074520037Z zeebe-1 2019-10-24 18:12:22.074 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.roles.CandidateRole - RaftServer{raft-atomix-partition-1}{role=CANDIDATE} - Starting election
  zeebe-1
I 2019-10-24T18:12:22.083206296Z zeebe-1 2019-10-24 18:12:22.082 [io.zeebe.broker.clustering.base.partitions.PartitionLeaderElection] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.broker.clustering - Partition role transitioning from FOLLOWER to CANDIDATE
  zeebe-1
I 2019-10-24T18:12:22.092658917Z zeebe-2 2019-10-24 18:12:22.092 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{raft-atomix-partition-1}{role=FOLLOWER} - Accepted PollRequest{term=5, candidate=1, lastLogIndex=1471509, lastLogTerm=5}: candidate's log is up-to-date
  zeebe-2
I 2019-10-24T18:12:22.093424193Z zeebe-1 2019-10-24 18:12:22.093 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Transitioning to LEADER
  zeebe-1
I 2019-10-24T18:12:22.094036487Z zeebe-1 2019-10-24 18:12:22.093 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Found leader 1
  zeebe-1
I 2019-10-24T18:12:22.108071635Z zeebe-2 2019-10-24 18:12:22.107 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{raft-atomix-partition-1}{role=FOLLOWER} - Accepted VoteRequest{term=6, candidate=1, lastLogIndex=1471509, lastLogTerm=5}: candidate's log is up-to-date
  zeebe-2
I 2019-10-24T18:12:22.463749334Z zeebe-0 2019-10-24 18:12:22.463 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.roles.LeaderRole - RaftServer{raft-atomix-partition-1}{role=LEADER} - Received greater term from 1
  zeebe-0
I 2019-10-24T18:12:22.463778001Z zeebe-0 2019-10-24 18:12:22.463 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Transitioning to FOLLOWER
  zeebe-0
I 2019-10-24T18:12:22.483427464Z zeebe-1 2019-10-24 18:12:22.482 [io.zeebe.broker.clustering.base.partitions.PartitionLeaderElection] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.broker.clustering - Partition role transitioning from CANDIDATE to LEADER
  zeebe-1
I 2019-10-24T18:12:22.483569859Z zeebe-1 2019-10-24 18:12:22.483 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-7] DEBUG io.zeebe.broker.clustering - Removing follower partition service for partition 1
  zeebe-1
I 2019-10-24T18:12:22.485095520Z zeebe-1 2019-10-24 18:12:22.484 [service-controller] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-7] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Closed restore server for topics: log-replication-1, restore-info-1, snapshot-request-1, snapshot-info-request-1
  zeebe-1
I 2019-10-24T18:12:22.485427797Z zeebe-1 2019-10-24 18:12:22.485 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-7] DEBUG io.zeebe.broker.clustering - Installing leader partition service for partition 1
  zeebe-1
I 2019-10-24T18:12:22.502494627Z zeebe-2 2019-10-24 18:12:22.502 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Found leader 1
  zeebe-2
I 2019-10-24T18:12:22.547224380Z zeebe-0 2019-10-24 18:12:22.547 [io.zeebe.broker.clustering.base.partitions.PartitionLeaderElection] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.broker.clustering - Partition role transitioning from LEADER to FOLLOWER
  zeebe-0
I 2019-10-24T18:12:22.548854612Z zeebe-0 2019-10-24 18:12:22.548 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Found leader 1
  zeebe-0
I 2019-10-24T18:12:23.158435800Z zeebe-1 2019-10-24 18:12:23.157 [] [raft-server-raft-atomix-partition-2-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-2 - Node 0 claiming leadership for LogStream partition 2 at term 3.
  zeebe-1
I 2019-10-24T18:12:23.174181765Z zeebe-1 2019-10-24 18:12:23.173 [] [raft-server-raft-atomix-partition-1-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Node 0 claiming leadership for LogStream partition 1 at term 3.
  zeebe-1
I 2019-10-24T18:12:23.174461523Z zeebe-1 2019-10-24 18:12:23.174 [] [raft-server-raft-atomix-partition-1-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Node 1 claiming leadership for LogStream partition 1 at term 6.
  zeebe-1
I 2019-10-24T18:12:23.180808290Z zeebe-1 2019-10-24 18:12:23.179 [] [raft-partition-group-raft-atomix-4] DEBUG io.zeebe.distributedlog.impl.DistributedLogstreamPartition - Partition 1 for node 1 claimed leadership
  zeebe-1
I 2019-10-24T18:12:23.186798193Z zeebe-0 2019-10-24 18:12:23.186 [] [raft-server-raft-atomix-partition-2-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-2 - Node 0 claiming leadership for LogStream partition 2 at term 3.
  zeebe-0
I 2019-10-24T18:12:23.188017625Z zeebe-0 2019-10-24 18:12:23.187 [] [raft-partition-group-raft-atomix-7] DEBUG io.zeebe.distributedlog.impl.DistributedLogstreamPartition - Partition 2 for node 0 claimed leadership
  zeebe-0
I 2019-10-24T18:12:23.190305366Z zeebe-0 2019-10-24 18:12:23.190 [] [raft-server-raft-atomix-partition-2-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-2 - Node 2 claiming leadership for LogStream partition 2 at term 2.
  zeebe-0
I 2019-10-24T18:12:23.191368124Z zeebe-2 2019-10-24 18:12:23.191 [] [raft-partition-group-raft-atomix-0] DEBUG io.zeebe.distributedlog.impl.DistributedLogstreamPartition - Partition 2 for node 2 claimed leadership
  zeebe-2
I 2019-10-24T18:12:23.194143733Z zeebe-0 2019-10-24 18:12:23.193 [] [raft-partition-group-raft-atomix-9] DEBUG io.zeebe.distributedlog.impl.DistributedLogstreamPartition - Partition 1 for node 0 claimed leadership
  zeebe-0
I 2019-10-24T18:12:23.262448037Z zeebe-0 2019-10-24 18:12:23.262 [] [raft-server-raft-atomix-partition-1-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Node 0 claiming leadership for LogStream partition 1 at term 3.
  zeebe-0
I 2019-10-24T18:12:23.262529978Z zeebe-0 2019-10-24 18:12:23.262 [] [raft-server-raft-atomix-partition-1-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Node 1 claiming leadership for LogStream partition 1 at term 6.
  zeebe-0
I 2019-10-24T18:12:23.385156903Z zeebe-1 2019-10-24 18:12:23.384 [] [raft-server-raft-atomix-partition-2-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-2 - Node 2 claiming leadership for LogStream partition 2 at term 2.
  zeebe-1
I 2019-10-24T18:12:23.410787901Z zeebe-1 2019-10-24 18:12:23.410 [service-controller] [zeebe-1.zeebe.zell.svc.cluster.local:26501-zb-actors-3] DEBUG io.zeebe.logstreams.snapshot - Available snapshots: []
  zeebe-1
I 2019-10-24T18:12:23.435506092Z zeebe-2 2019-10-24 18:12:23.435 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Available snapshots: [/usr/local/zeebe/data/partition-2/state/snapshots/201866752032, /usr/local/zeebe/data/partition-2/state/snapshots/103081167504]
  zeebe-2
I 2019-10-24T18:12:23.467785037Z zeebe-0 2019-10-24 18:12:23.467 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Available snapshots: [/usr/local/zeebe/data/partition-2/state/snapshots/201866752032, /usr/local/zeebe/data/partition-2/state/snapshots/103081167504]
  zeebe-0
I 2019-10-24T18:12:23.470893273Z zeebe-1 2019-10-24 18:12:23.470 [] [raft-server-raft-atomix-partition-1-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Node 2 claiming leadership for LogStream partition 1 at term 2.
  zeebe-1
I 2019-10-24T18:12:23.476200959Z zeebe-0 2019-10-24 18:12:23.476 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.logstreams.snapshot - Available snapshots: [/usr/local/zeebe/data/partition-1/state/snapshots/201870135920, /usr/local/zeebe/data/partition-1/state/snapshots/103081957576]
  zeebe-0
I 2019-10-24T18:12:23.492049714Z zeebe-2 2019-10-24 18:12:23.491 [] [raft-partition-group-raft-atomix-2] DEBUG io.zeebe.distributedlog.impl.DistributedLogstreamPartition - Partition 1 for node 2 claimed leadership
  zeebe-2
I 2019-10-24T18:12:23.503374302Z zeebe-0 2019-10-24 18:12:23.503 [] [raft-server-raft-atomix-partition-1-state] DEBUG io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Node 2 claiming leadership for LogStream partition 1 at term 2.
  zeebe-0
I 2019-10-24T18:12:23.702540471Z zeebe-2 2019-10-24 18:12:23.702 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-1] DEBUG io.zeebe.logstreams.snapshot - Available snapshots: [/usr/local/zeebe/data/partition-1/state/snapshots/201870135920, /usr/local/zeebe/data/partition-1/state/snapshots/103081957576]
  zeebe-2
I 2019-10-24T18:12:23.775583655Z zeebe-2 2019-10-24 18:12:23.775 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Opened database from '/usr/local/zeebe/data/partition-2/state/runtime'.
  zeebe-2
I 2019-10-24T18:12:23.775616839Z zeebe-2 2019-10-24 18:12:23.775 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Recovered state from snapshot '/usr/local/zeebe/data/partition-2/state/snapshots/201866752032'
  zeebe-2
I 2019-10-24T18:12:23.786687937Z zeebe-2 2019-10-24 18:12:23.786 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Started restore server for topics: log-replication-2, restore-info-2, snapshot-request-2, snapshot-info-request-2
  zeebe-2
I 2019-10-24T18:12:23.787042900Z zeebe-2 2019-10-24 18:12:23.786 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-7] DEBUG io.zeebe.broker.clustering - Removing leader partition services for partition 2
  zeebe-2
I 2019-10-24T18:12:23.787729807Z zeebe-2 2019-10-24 18:12:23.787 [topology] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-3] DEBUG io.zeebe.broker.clustering - Updating partition information for partition 2 on Node{nodeId=2, commandApi=zeebe-2.zeebe.zell.svc.cluster.local:26501} with state LEADER
  zeebe-2
I 2019-10-24T18:12:23.788747561Z zeebe-2 2019-10-24 18:12:23.787 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-2] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Closed restore server for topics: log-replication-2, restore-info-2, snapshot-request-2, snapshot-info-request-2
  zeebe-2
I 2019-10-24T18:12:23.809702829Z zeebe-0 2019-10-24 18:12:23.809 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Opened database from '/usr/local/zeebe/data/partition-2/state/runtime'.
  zeebe-0
I 2019-10-24T18:12:23.809755074Z zeebe-0 2019-10-24 18:12:23.809 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Recovered state from snapshot '/usr/local/zeebe/data/partition-2/state/snapshots/201866752032'
  zeebe-0
I 2019-10-24T18:12:23.812303776Z zeebe-0 2019-10-24 18:12:23.812 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.logstreams.snapshot - Opened database from '/usr/local/zeebe/data/partition-1/state/runtime'.
  zeebe-0
I 2019-10-24T18:12:23.812329189Z zeebe-0 2019-10-24 18:12:23.812 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.logstreams.snapshot - Recovered state from snapshot '/usr/local/zeebe/data/partition-1/state/snapshots/201870135920'
  zeebe-0
I 2019-10-24T18:12:23.816979726Z zeebe-2 #
  zeebe-2
I 2019-10-24T18:12:23.817029524Z zeebe-2 # A fatal error has been detected by the Java Runtime Environment:
  zeebe-2
I 2019-10-24T18:12:23.817035142Z zeebe-2 #
  zeebe-2
I 2019-10-24T18:12:23.817039497Z zeebe-2 #  SIGSEGV (0xb) at pc=0x00007f483c0b5410, pid=15, tid=42
  zeebe-2
I 2019-10-24T18:12:23.817044612Z zeebe-2 #
  zeebe-2
I 2019-10-24T18:12:23.817049032Z zeebe-2 # JRE version: OpenJDK Runtime Environment (11.0.4+11) (build 11.0.4+11)
  zeebe-2
I 2019-10-24T18:12:23.817053877Z zeebe-2 # Java VM: OpenJDK 64-Bit Server VM (11.0.4+11, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
  zeebe-2
I 2019-10-24T18:12:23.817058624Z zeebe-2 # Problematic frame:
  zeebe-2
I 2019-10-24T18:12:23.817065839Z zeebe-2 # C  0x00007f483c0b5410
  zeebe-2
I 2019-10-24T18:12:23.817070374Z zeebe-2 #
  zeebe-2
I 2019-10-24T18:12:23.817075123Z zeebe-2 # Core dump will be written. Default location: /usr/local/zeebe/core.%e.%p.%t
  zeebe-2
I 2019-10-24T18:12:23.817079970Z zeebe-2 #
  zeebe-2
I 2019-10-24T18:12:23.819286764Z zeebe-2 # An error report file with more information is saved as:
  zeebe-2
I 2019-10-24T18:12:23.819318866Z zeebe-2 # /usr/local/zeebe/data/zeebe_error15.log
  zeebe-2
I 2019-10-24T18:12:23.820860206Z zeebe-0 2019-10-24 18:12:23.820 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Started restore server for topics: log-replication-2, restore-info-2, snapshot-request-2, snapshot-info-request-2
  zeebe-0
I 2019-10-24T18:12:23.821242769Z zeebe-2 2019-10-24 18:12:23.792 [service-controller] [zeebe-2.zeebe.zell.svc.cluster.local:26501-zb-actors-2] ERROR io.zeebe.broker.clustering - Unexpected error occurred while closing the state snapshot controller for partition 2.
  zeebe-2
I 2019-10-24T18:12:23.821267860Z zeebe-2 java.util.ConcurrentModificationException: null
    at java.util.ArrayList.forEach(Unknown Source) ~[?:?]
    at io.zeebe.db.impl.rocksdb.transaction.ZeebeTransactionDb.close(ZeebeTransactionDb.java:406) ~[zeebe-db-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.logstreams.state.StateSnapshotController.close(StateSnapshotController.java:319) ~[zeebe-logstreams-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.broker.clustering.base.partitions.Partition.stop(Partition.java:126) ~[zeebe-broker-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.servicecontainer.impl.ServiceController.invokeStop(ServiceController.java:126) ~[zeebe-service-container-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependentsStopped.accept(ServiceController.java:359) ~[zeebe-service-container-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependentsStopped.accept(ServiceController.java:355) ~[zeebe-service-container-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.servicecontainer.impl.ServiceController.onServiceEvent(ServiceController.java:105) ~[zeebe-service-container-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) [zeebe-util-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:127) [zeebe-util-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:107) [zeebe-util-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:91) [zeebe-util-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
    at io.zeebe.util.sched.ActorThread.run(ActorThread.java:195) [zeebe-util-0.22.0-SNAPSHOT.jar:0.22.0-SNAPSHOT]
  zeebe-2
I 2019-10-24T18:12:23.822021423Z zeebe-0 2019-10-24 18:12:23.821 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-3] DEBUG io.zeebe.broker.clustering - Removing leader partition services for partition 2
  zeebe-0
I 2019-10-24T18:12:23.822221297Z zeebe-0 2019-10-24 18:12:23.821 [topology] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.broker.clustering - Updating partition information for partition 2 on Node{nodeId=0, commandApi=zeebe-0.zeebe.zell.svc.cluster.local:26501} with state LEADER
  zeebe-0
I 2019-10-24T18:12:23.822753673Z zeebe-0 2019-10-24 18:12:23.822 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Started restore server for topics: log-replication-1, restore-info-1, snapshot-request-1, snapshot-info-request-1
  zeebe-0
I 2019-10-24T18:12:23.822913720Z zeebe-0 2019-10-24 18:12:23.822 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-3] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Closed restore server for topics: log-replication-2, restore-info-2, snapshot-request-2, snapshot-info-request-2
  zeebe-0
I 2019-10-24T18:12:23.822932646Z zeebe-0 2019-10-24 18:12:23.822 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.broker.clustering - Removing leader partition services for partition 1
  zeebe-0
I 2019-10-24T18:12:23.823246019Z zeebe-0 2019-10-24 18:12:23.823 [topology] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-4] DEBUG io.zeebe.broker.clustering - Updating partition information for partition 1 on Node{nodeId=0, commandApi=zeebe-0.zeebe.zell.svc.cluster.local:26501} with state LEADER
  zeebe-0
I 2019-10-24T18:12:23.823306896Z zeebe-0 2019-10-24 18:12:23.823 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.broker.logstreams.restore.BrokerRestoreServer - Closed restore server for topics: log-replication-1, restore-info-1, snapshot-request-1, snapshot-info-request-1
  zeebe-0
I 2019-10-24T18:12:23.825423975Z zeebe-0 2019-10-24 18:12:23.825 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-3] DEBUG io.zeebe.logstreams.snapshot - Closed database from '/usr/local/zeebe/data/partition-2/state/runtime'.
  zeebe-0
I 2019-10-24T18:12:23.826798734Z zeebe-0 2019-10-24 18:12:23.826 [service-controller] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-6] DEBUG io.zeebe.logstreams.snapshot - Closed database from '/usr/local/zeebe/data/partition-1/state/runtime'.
  zeebe-0
I 2019-10-24T18:12:23.828346114Z zeebe-2 [thread 44 also had an error]
  zeebe-2
I 2019-10-24T18:12:23.829763068Z zeebe-0 2019-10-24 18:12:23.829 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-5] DEBUG io.zeebe.broker.clustering - Installing follower partition service for partition 2
  zeebe-0
I 2019-10-24T18:12:23.829788543Z zeebe-0 2019-10-24 18:12:23.829 [io.zeebe.broker.clustering.base.partitions.PartitionInstallService] [zeebe-0.zeebe.zell.svc.cluster.local:26501-zb-actors-1] DEBUG io.zeebe.broker.clustering - Installing follower partition service for partition 1
  zeebe-0
I 2019-10-24T18:12:23.890396223Z zeebe-2 #
  zeebe-2
I 2019-10-24T18:12:23.890432497Z zeebe-2 # If you would like to submit a bug report, please visit:
  zeebe-2
I 2019-10-24T18:12:23.890437838Z zeebe-2 #   http://bugreport.java.com/bugreport/crash.jsp
  zeebe-2
I 2019-10-24T18:12:23.890442038Z zeebe-2 # The crash happened outside the Java Virtual Machine in native code.
  zeebe-2
I 2019-10-24T18:12:23.890446530Z zeebe-2 # See problematic frame for where to report the bug.
  zeebe-2
I 2019-10-24T18:12:23.890450853Z zeebe-2 #
  zeebe-2
```
Zelldon commented 5 years ago

might related to #3274