Closed DavidePrincipi closed 3 months ago
In testing:
in 7.9.2009/testing
:
Test case 1
With core 2.8.2-dev.1 the add-node action does not allow calls to add-node with a public_key already used. For example you can
Execute manually the action:
api-cli run add-node --data <PAYLOAD_HERE>
Test case 2
The bug must be not reproducible with nethserver-ns8-migration from testing, with and without core 2.8.2-dev.1 (which is just a safety net validator for the cluster).
With the testing release,
You're not allowed to reuse an active public key:
~]# api-cli run add-node --data - <<EOF
> {
"endpoint": "",
"node_pwh": "8f01f499f7dfdf55a083515e3c7706917b6b67ddebb13ca555552636a32000ae",
"public_key": "3VdMc/oIhm5vysZVDkHZ+Vlzzryl3R6YFgT/9Dro7RA="
}
> EOF
Warning: using user "cluster" credentials from the environment
<4>The public key 3VdMc/oIhm5vysZVDkHZ+Vlzzryl3R6YFgT/9Dro7RA= is already used by node 2
[{"field": "public_key", "parameter": "public_key", "error": "public_key_matches_existing_node", "value": "2"}]
In case of working join after previous failed attempts, the wireguard config is coherent on NS7 side:
~]# config show ns8
ns8=configuration
Host=rl11.nr.nethserver.net
LeaderIpAddress=10.5.4.1
Password=MyTestPAss
TLSVerify=disabled
User=admin
~]# wg
interface: ns8
public key: kN1yyzDbnAhFhw2m4dcY/nVOjgcl7M0QquKn4ZNs9i0=
private key: (hidden)
listening port: 44916
peer: oYouWUkvqlcYB13KmXQe67SN5dQ3AsTkttKONO3AjWg=
endpoint: 165.232.65.11:55820
allowed ips: 10.5.4.0/24
latest handshake: 13 seconds ago
transfer: 10.25 KiB received, 11.52 KiB sent
NS8 leader and NS7 correctly talk each other.
On NS8 side you need to clean up bogus wireguard
configs:
~]# redis-cli keys *vpn*
1) "node/1/vpn"
2) "node/4/vpn"
3) "node/3/vpn"
4) "node/2/vpn"
~]# wg
interface: wg0
public key: oYouWUkvqlcYB13KmXQe67SN5dQ3AsTkttKONO3AjWg=
private key: (hidden)
listening port: 55820
peer: kN1yyzDbnAhFhw2m4dcY/nVOjgcl7M0QquKn4ZNs9i0=
endpoint: 164.92.229.123:44916
allowed ips: 10.5.4.4/32
latest handshake: 34 seconds ago
transfer: 11.55 KiB received, 10.25 KiB sent
persistent keepalive: every 25 seconds
peer: /Q9I0ILStidtyyo/IdGjVsveBrs3NDjAzGNJB+s7XAI=
allowed ips: 10.5.4.2/32
persistent keepalive: every 25 seconds
peer: 8+oya7v8BSMLjSiRQ2FVwMRUU3XkpO60JLDqa6ydVSs=
allowed ips: 10.5.4.3/32
persistent keepalive: every 25 seconds
in 7.9.2009/testing
:
in 7.9.2009/updates
:
If the ns8-join command of the migration tool fails, a duplicate Redis key is generated for each failed attempt. If many failed attempts were run, the Wireguard peer table is polluted by duplicates and the wg0 configuration breaks.
Steps to reproduce
myhost.dom.test
. As consequence, the leader FQDN is not in DNS: it is a condition that despite the docs, is often forgot.ns8-join --no-tlsverify <LEADER_IP> admin Nethesis,1234
ns8-leave
Expected behavior
Join fails. Only the last join attempt is left in the Redis DB, with the higher NODE_ID.
Actual behavior
After last join attempt in ns7:
Node keys from the first join attempt are still in place:
They overwrite the Wireguard "allowed ips" field, breaking the VPN configuration:
:warning: note IP 10.5.4.5, from a stale Redis node key.
Components
See also
https://community.nethserver.org/t/migration-tool-duplicates-redis-keys-of-node/23789
Thanks to @mrmarkuz