camunda / camunda

Process Orchestration Framework
https://camunda.com/platform/
3.33k stars 605 forks source link

Raft members that leave the cluster are still polled for votes #9648

Open lenaschoenburg opened 2 years ago

lenaschoenburg commented 2 years ago

Describe the bug When raft members leave the cluster they are still polled for votes even though we should know they are not active anymore. This is wasteful and adds logging noise.

In the following logs, the leader was removed which immediately triggers a leader election that needlessly polls the former leader again.

io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-2}{role=FOLLOWER} - Known leader 3 was removed from cluster, sending poll requests
io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-2}{role=FOLLOWER} - Sending poll requests to all active members: [DefaultRaftMember{id=3, type=ACTIVE, updated=2022-06-17T10:16:47.857Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2022-06-17T10:16:47.857Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2022-06-17T10:16:47.857Z}]
io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-2}{role=FOLLOWER} - Poll request to 3 failed: java.net.ConnectException: Expected to send a message with subject 'raft-partition-partition-2-poll' to member '3', but member is not known.

I'm not 100% sure that this is something we could implement safely, so I'm opening this mostly as a tracking issue.

To Reproduce

Not entirely clear, probably by just removing member 3.

Expected behavior Due to the first log, it sounds to me like the raft layer should know that member 3 is not active currently and shouldn't poll it for votes.

Environment:

lenaschoenburg commented 2 years ago

Additionally, the TopologyPartitionListener does not get notified when leaders leave the cluster.

megglos commented 2 years ago

Triage: not a bug but a flaw that causes noise, potential low hanging fruit if someone finds time