Closed DavidePrincipi closed 3 days ago
Test case 0
Test case 1
Check the join works after fixing the VPN endpoint with this command (assuming 1 is the NODE_ID of leader):
redis-cli hset node/1/vpn endpoint rl1.dp.nethserver.net:55820
The bug is fixed if the worker node is still capable of joining the cluster after a failed attempt.
test case 0: VERIFIED
In the event of an invalid domain for the leader, the join attempts generate a clear error:
test case 1: VERIFIED
Once the new, correct FQDN for the leader is set and the VPN endpoint is fixed in redis, the join works flawlessly.
After a failed join attempt, the node RL2 is left in an invalid state: it cannot rejoin the cluster or become the first node of a new cluster.
Steps to reproduce
dp.test
rl1.dp.nethserver.net
Expected behavior
I expect the join works, or I can recover from the error by some means.
Actual behavior
Despite the error message, RL2 UI shows a link to the leader node, giving me the impression that the join in the end was successful.
If I reload the page, RL2 shows again the initial choice screen to choose among create cluster, join node, restore from backup.
If I choose create-cluster, the create-cluster procedure configures RL2 as leader of a new cluster, but a conflict on the
ns-wireguard
firewall service occurs.In RL2 journal, the original join failure
The action create-cluster on RL2 fails with
Components
See also
Discussion (PVT) https://mattermost.nethesis.it/nethesis/pl/rqr3abki53rr9ngsrxpeow835h
Thanks to @nrauso