NethServer / dev

NethServer issue tracker
https://github.com/NethServer/dev/issues
62 stars 20 forks source link

Migration tool duplicates Redis keys of node #6940

Closed DavidePrincipi closed 3 months ago

DavidePrincipi commented 3 months ago

If the ns8-join command of the migration tool fails, a duplicate Redis key is generated for each failed attempt. If many failed attempts were run, the Wireguard peer table is polluted by duplicates and the wg0 configuration breaks.

Steps to reproduce

Expected behavior

Join fails. Only the last join attempt is left in the Redis DB, with the higher NODE_ID.

Actual behavior

After last join attempt in ns7:

[root@nscom2 ~]# config show wg-quick@ns8 
wg-quick@ns8=service
    Address=10.5.4.7
    RemoteEndpoint=rl1.dom.test:55820
    RemoteKey=XXXXXXXX
    RemoteNetwork=10.5.4.0/24
    status=enabled

Node keys from the first join attempt are still in place:

[root@rl1 ~]# redis-cli keys node/*/vpn
1) "node/7/vpn"
2) "node/5/vpn"
3) "node/4/vpn"
4) "node/6/vpn"
5) "node/3/vpn"
6) "node/2/vpn"
7) "node/1/vpn"

They overwrite the Wireguard "allowed ips" field, breaking the VPN configuration:

[root@rl1 ~]# wg
interface: wg0
  public key: pfd5Bm8HnII6ZC18Ojuhrn02sBen1fvDX29KroKARxs=
  private key: (hidden)
  listening port: 55820

peer: RKUWF/SLwotQJq5OfDxUFSoHhSZ0D7kwGMAocwX9FSI=
  allowed ips: 10.5.4.5/32
  persistent keepalive: every 25 seconds

:warning: note IP 10.5.4.5, from a stale Redis node key.

Components

See also

https://community.nethserver.org/t/migration-tool-duplicates-redis-keys-of-node/23789

Thanks to @mrmarkuz

DavidePrincipi commented 3 months ago

In testing:

nethbot commented 3 months ago

in 7.9.2009/testing:

DavidePrincipi commented 3 months ago

Test case 1

With core 2.8.2-dev.1 the add-node action does not allow calls to add-node with a public_key already used. For example you can

Execute manually the action:

api-cli run add-node --data <PAYLOAD_HERE>

Test case 2

The bug must be not reproducible with nethserver-ns8-migration from testing, with and without core 2.8.2-dev.1 (which is just a safety net validator for the cluster).

With the testing release,

nrauso commented 3 months ago

test case 1: VERIFIED

You're not allowed to reuse an active public key:

~]# api-cli run add-node --data - <<EOF
> {
    "endpoint": "",
    "node_pwh": "8f01f499f7dfdf55a083515e3c7706917b6b67ddebb13ca555552636a32000ae",
    "public_key": "3VdMc/oIhm5vysZVDkHZ+Vlzzryl3R6YFgT/9Dro7RA="
  }
> EOF
Warning: using user "cluster" credentials from the environment
<4>The public key 3VdMc/oIhm5vysZVDkHZ+Vlzzryl3R6YFgT/9Dro7RA= is already used by node 2
[{"field": "public_key", "parameter": "public_key", "error": "public_key_matches_existing_node", "value": "2"}]

test case 2: VERIFIED

In case of working join after previous failed attempts, the wireguard config is coherent on NS7 side:

~]# config show ns8
ns8=configuration
    Host=rl11.nr.nethserver.net
    LeaderIpAddress=10.5.4.1
    Password=MyTestPAss
    TLSVerify=disabled
    User=admin

~]# wg
interface: ns8
  public key: kN1yyzDbnAhFhw2m4dcY/nVOjgcl7M0QquKn4ZNs9i0=
  private key: (hidden)
  listening port: 44916

peer: oYouWUkvqlcYB13KmXQe67SN5dQ3AsTkttKONO3AjWg=
  endpoint: 165.232.65.11:55820
  allowed ips: 10.5.4.0/24
  latest handshake: 13 seconds ago
  transfer: 10.25 KiB received, 11.52 KiB sent

NS8 leader and NS7 correctly talk each other. On NS8 side you need to clean up bogus wireguard configs:

~]# redis-cli keys *vpn*
1) "node/1/vpn"
2) "node/4/vpn"
3) "node/3/vpn"
4) "node/2/vpn"

~]# wg
interface: wg0
  public key: oYouWUkvqlcYB13KmXQe67SN5dQ3AsTkttKONO3AjWg=
  private key: (hidden)
  listening port: 55820

peer: kN1yyzDbnAhFhw2m4dcY/nVOjgcl7M0QquKn4ZNs9i0=
  endpoint: 164.92.229.123:44916
  allowed ips: 10.5.4.4/32
  latest handshake: 34 seconds ago
  transfer: 11.55 KiB received, 10.25 KiB sent
  persistent keepalive: every 25 seconds

peer: /Q9I0ILStidtyyo/IdGjVsveBrs3NDjAzGNJB+s7XAI=
  allowed ips: 10.5.4.2/32
  persistent keepalive: every 25 seconds

peer: 8+oya7v8BSMLjSiRQ2FVwMRUU3XkpO60JLDqa6ydVSs=
  allowed ips: 10.5.4.3/32
  persistent keepalive: every 25 seconds
nethbot commented 3 months ago

in 7.9.2009/testing:

nethbot commented 3 months ago

in 7.9.2009/updates:

DavidePrincipi commented 3 months ago

Released https://github.com/NethServer/ns8-core/releases/tag/2.8.2