Join token for worker is invalid

stevensbkang commented 6 months ago

Summary

After setting up a cluster of 3 manager nodes successfully, I started adding worker nodes but encountered an error invalid token.

What Should Happen Instead?

k8s get-join-token ${NODE_NAME} --worker should come with a valid token so the sudo k8s join-cluster command works.

Reproduction Steps

Execute k8s get-join-token ${NODE_NAME} --worker to grab the token
Then execute sudo k8s join-cluster ${TOKEN} from a worker node

Error messages below:

Joining the cluster. This may take a few seconds, please wait.
Cleaning up, error was failed to join k8sd cluster as worker: Failed to run post-bootstrap actions: HTTP request for worker node info failed: invalid token
Error: Failed to join the cluster using the provided token.

The error was: failed to POST /k8sd/cluster/join: failed to join k8sd cluster as worker: Failed to run post-bootstrap actions: HTTP request for worker node info failed: invalid token

For your reference, worker token does not look valid:

echo "eyJ0b2tlbiI6IiIsInNlY3JldCI6Indvcmtlcjo6OTU0OTY1ZmY0Y2Y1NTM3MTBhNWRhMmQ5NWRjODJiN2RmZGI3YzMyNiIsImpvaW5fYWRkcmVzc2VzIjpbIjEwLjExMS4xNjAuMjMyOjY0MDAiLCIxMC4xMTEuMTYwLjIyNDo2NDAwIiwiMTAuMTExLjE2MC4yMjY6NjQwMCJdLCJmaW5nZXJwcmludCI6ImU0ZmY4ZGI0NmFjMWUyMGMxZDYwZWM0NGQwMzBkZGMzNWRkMjVjMDkwNzRmMTczYTBjMjdiMzU2YzYzZDk5OTYiLCJfIjoibSEhIn0" | base64 -d
{"token":"","secret":"worker::954965ff4cf553710a5da2d95dc82b7dfdb7c326","join_addresses":["10.111.160.232:6400","10.111.160.224:6400","10.111.160.226:6400"],"fingerprint":"e4ff8db46ac1e20c1d60ec44d030ddc35dd25c09074f173a0c27b356c63d9996","_":"m!!%

System information

root@k8s-perf-opt-m01:~# snap version
snap    2.61.3+22.04
snapd   2.61.3+22.04
series  16
ubuntu  22.04
kernel  5.15.0-105-generic
root@k8s-perf-opt-m01:~# uname -a
Linux k8s-perf-opt-m01 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@k8s-perf-opt-m01:~# snap list k8s
Name  Version  Rev  Tracking     Publisher   Notes
k8s   v1.30.0  269  latest/edge  canonical✓  classic
root@k8s-perf-opt-m01:~# snap services k8s
Service                      Startup   Current   Notes
k8s.containerd               enabled   active    -
k8s.k8s-apiserver-proxy      disabled  inactive  -
k8s.k8s-dqlite               enabled   active    -
k8s.k8sd                     enabled   active    -
k8s.kube-apiserver           enabled   active    -
k8s.kube-controller-manager  enabled   active    -
k8s.kube-proxy               enabled   active    -
k8s.kube-scheduler           enabled   active    -
k8s.kubelet                  enabled   active    -
root@k8s-perf-opt-m01:~# k8s status
status: ready
high-availability: yes
datastore:
  type: k8s-dqlite
  voter-nodes:
    - 10.111.160.224:6400
    - 10.111.160.226:6400
    - 10.111.160.232:6400
  standby-nodes: none
  spare-nodes: none
network:
  enabled: true
dns:
  enabled: true
  cluster-domain: cluster.local
  service-ip: 10.152.183.100
  upstream-nameservers:
  - /etc/resolv.conf
ingress:
  enabled: true
  default-tls-secret: ""
  enable-proxy-protocol: false
load-balancer:
  enabled: true
  cidrs: []
  l2-mode: false
  l2-interfaces: []
  bgp-mode: false
  bgp-local-asn: 0
  bgp-peer-address: ""
  bgp-peer-asn: 0
  bgp-peer-port: 0
local-storage:
  enabled: false
  local-path: /var/snap/k8s/common/rawfile-storage
  reclaim-policy: Delete
  default: true
gateway:
  enabled: true

Can you suggest a fix?

Not at the moment, need to look at the logic

Are you interested in contributing with a fix?

Yes, I just need to know the guidelines :)

stevensbkang commented 6 months ago

Oh, and here are the k8sd logs:

root@k8s-perf-opt-m01:~# snap logs k8s.k8sd
2024-04-24T04:02:05Z k8s.k8sd[3446]: 2024/04/24 04:02:05 failed to watch configmap: watch closed
2024-04-24T04:30:35Z k8s.k8sd[3446]: No token exists yet. Creating a new token.
2024-04-24T04:33:16Z k8s.k8sd[3446]: 2024/04/24 04:33:16 failed to watch configmap: watch closed
2024-04-24T05:14:31Z k8s.k8sd[3446]: 2024/04/24 05:14:31 failed to watch configmap: watch closed
2024-04-24T06:05:16Z k8s.k8sd[3446]: 2024/04/24 06:05:16 failed to watch configmap: watch closed
2024-04-24T07:03:45Z k8s.k8sd[3446]: 2024/04/24 07:03:45 failed to watch configmap: watch closed
2024-04-24T07:34:58Z k8s.k8sd[3446]: 2024/04/24 07:34:58 failed to watch configmap: watch closed
2024-04-24T08:07:02Z k8s.k8sd[3446]: 2024/04/24 08:07:02 failed to watch configmap: watch closed
2024-04-24T08:48:46Z k8s.k8sd[3446]: 2024/04/24 08:48:46 failed to watch configmap: watch closed
2024-04-24T09:20:55Z k8s.k8sd[3446]: 2024/04/24 09:20:55 failed to watch configmap: watch closed

bschimke95 commented 6 months ago

Hi @stevensbkang Thanks for raising this. What kind of nodes are you running? VM, LXD container, ...?

Could you try to join the node under a different name?

# on cluster node
sudo k8s get-join-token myworkernode --worker
<token>
# on worker
sudo k8s join-cluster <token> --name myworkernode

By default, k8s join-cluster will take the hostname to join the cluster. I suspect there is a mismatch between the name set in k8s get-join-token and the one in k8s join-cluster.

stevensbkang commented 6 months ago

Ahh, gotcha... I thought the get-join-token expects one of the manager nodes, but in fact, it is the node name that is joining the cluster. It works as expected now, thanks so much!

canonical / k8s-snap