Short version: there is an edge case that was not handled well when very aggressively rejoining the cluster from the same host/port.
This can happen in k8s when a pod gets aggressively restarted, or on command line apps when someone joins a cluster "sends just one request and kills the app" since they both then still may be present in SWIM gossip (correctly) as dead, and the cluster may misinterpret this about information about "itself" when the new node joins.
More analysis:
steps:
7 joins 8
7 is leader
minumum 2 nodes before we elect one
lower address wins
7 dies
8 cannot declare 7 as down
only leader can do this
this is ok, as designed // these systems expect tog et back their node count and then recover
7 reboot -- let's call it 77
same host/port
new UID
handshake with 8
8 accepts
77 gets accept
8 declares "previous 7" as down, since 77 is the replacement
8 declaring 7 down is correct
but it means we have a down 7 in membership
this is also correct; other nodes may not yet know about this, so we want to spread this down information that 8 first noticed
gossip in includes old node 7 (okey)
Node 77 receives gossip through SWIM and that includes 7:
In other words, SWIM spreads information about both nodes since it is not confirmDead yet -- THIS IS OK. But continuing to act on the removed node's information is NOT ok.
Long story short: SWIM tells us that a node on this address was dead, but we know we are not dead -- this should only happen on high level gossip, when we see a .down somewhere about us. So we can ignore this from the SWIM level.
This should also get fixed in SWIM itself though, I'll follow up there.
Resolves : https://github.com/apple/swift-distributed-actors/issues/1082
Short version: there is an edge case that was not handled well when very aggressively rejoining the cluster from the same host/port.
This can happen in k8s when a pod gets aggressively restarted, or on command line apps when someone joins a cluster "sends just one request and kills the app" since they both then still may be present in SWIM gossip (correctly) as dead, and the cluster may misinterpret this about information about "itself" when the new node joins.
More analysis:
steps:
2022-11-01T13:08:56+0900 trace Client : actor/id=/user/swim actor/path=/user/swim cluster/node=sact://REPLACEMENT_77@127.0.0.1:7337 swim/incarnation=0 swim/members/all=["SWIM.Member(SWIMActor(id:sact://RemoteCluster:7602583950674506995@127.0.0.1:8337/user/swim, node:sact://RemoteCluster:7602583950674506995@127.0.0.1:8337, alive(incarnation: 0), protocolPeriod: 1)", "SWIM.Member(SWIMActor(id:/user/swim, node:sact://REPLACEMENT_77@127.0.0.1:7337, alive(incarnation: 0), protocolPeriod: 0)"] swim/members/count=2 swim/ping/origin=sact://RemoteCluster:7602583950674506995@127.0.0.1:8337/user/swim swim/ping/payload=membership([SWIM.Member(SWIMActor(id:sact://OLD_NODE_7@127.0.0.1:7337/user/swim, node:sact://OLD_NODE_7@127.0.0.1:7337, suspect(incarnation: 0, suspectedBy: Set([sact://sact@127.0.0.1:8337#7602583950674506995])), protocolPeriod: 56), SWIM.Member(SWIMActor(id:sact://RemoteCluster:7602583950674506995@127.0.0.1:8337/user/swim, node:sact://RemoteCluster:7602583950674506995@127.0.0.1:8337, alive(incarnation: 0), protocolPeriod: 0), SWIM.Member(SWIMActor(id:/user/swim, node:sact://REPLACEMENT_77@127.0.0.1:7337, alive(incarnation: 0), protocolPeriod: 56)]) swim/ping/seqNr=4 swim/protocolPeriod=1 swim/suspects/count=0 swim/timeoutSuspectsBeforePeriodMax=11 swim/timeoutSuspectsBeforePeriodMin=4 [DistributedCluster] Received ping@4
swim/ping/payload=membership([
SWIM.Member(SWIMActor(id:sact://OLD_NODE_7@127.0.0.1:7337/user/swim, node:sact://OLD_NODE_7@127.0.0.1:7337, suspect(incarnation: 0, suspectedBy: Set([sact://sact@127.0.0.1:8337#7602583950674506995])), protocolPeriod: 56),
SWIM.Member(SWIMActor(id:sact://REPLACEMENT_77@127.0.0.1:7337/user/swim, node:sact://REPLACEMENT_77@127.0.0.1:7337, alive(incarnation: 0), protocolPeriod: 56),
SWIM.Member(SWIMActor(id:sact://RemoteCluster:7602583950674506995@127.0.0.1:8337/user/swim, node:sact://RemoteCluster:7602583950674506995@127.0.0.1:8337, alive(incarnation: 0), protocolPeriod: 0)
confirmDead
yet -- THIS IS OK. But continuing to act on the removed node's information is NOT ok.Long story short: SWIM tells us that a node on this address was dead, but we know we are not dead -- this should only happen on high level gossip, when we see a .down somewhere about us. So we can ignore this from the SWIM level.
This should also get fixed in SWIM itself though, I'll follow up there.