canonical / k8s-snap

Canonical Kubernetes is an opinionated and CNCF conformant Kubernetes operated by Snaps and Charms, which come together to bring simplified operations and an enhanced security posture on any infrastructure.
GNU General Public License v3.0
42 stars 12 forks source link

Join token for worker is invalid #367

Closed stevensbkang closed 6 months ago

stevensbkang commented 6 months ago

Summary

After setting up a cluster of 3 manager nodes successfully, I started adding worker nodes but encountered an error invalid token.

What Should Happen Instead?

k8s get-join-token ${NODE_NAME} --worker should come with a valid token so the sudo k8s join-cluster command works.

Reproduction Steps

  1. Execute k8s get-join-token ${NODE_NAME} --worker to grab the token
  2. Then execute sudo k8s join-cluster ${TOKEN} from a worker node

Error messages below:

Joining the cluster. This may take a few seconds, please wait.
Cleaning up, error was failed to join k8sd cluster as worker: Failed to run post-bootstrap actions: HTTP request for worker node info failed: invalid token
Error: Failed to join the cluster using the provided token.

The error was: failed to POST /k8sd/cluster/join: failed to join k8sd cluster as worker: Failed to run post-bootstrap actions: HTTP request for worker node info failed: invalid token

For your reference, worker token does not look valid:

echo "eyJ0b2tlbiI6IiIsInNlY3JldCI6Indvcmtlcjo6OTU0OTY1ZmY0Y2Y1NTM3MTBhNWRhMmQ5NWRjODJiN2RmZGI3YzMyNiIsImpvaW5fYWRkcmVzc2VzIjpbIjEwLjExMS4xNjAuMjMyOjY0MDAiLCIxMC4xMTEuMTYwLjIyNDo2NDAwIiwiMTAuMTExLjE2MC4yMjY6NjQwMCJdLCJmaW5nZXJwcmludCI6ImU0ZmY4ZGI0NmFjMWUyMGMxZDYwZWM0NGQwMzBkZGMzNWRkMjVjMDkwNzRmMTczYTBjMjdiMzU2YzYzZDk5OTYiLCJfIjoibSEhIn0" | base64 -d
{"token":"","secret":"worker::954965ff4cf553710a5da2d95dc82b7dfdb7c326","join_addresses":["10.111.160.232:6400","10.111.160.224:6400","10.111.160.226:6400"],"fingerprint":"e4ff8db46ac1e20c1d60ec44d030ddc35dd25c09074f173a0c27b356c63d9996","_":"m!!%

System information

root@k8s-perf-opt-m01:~# snap version
snap    2.61.3+22.04
snapd   2.61.3+22.04
series  16
ubuntu  22.04
kernel  5.15.0-105-generic
root@k8s-perf-opt-m01:~# uname -a
Linux k8s-perf-opt-m01 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@k8s-perf-opt-m01:~# snap list k8s
Name  Version  Rev  Tracking     Publisher   Notes
k8s   v1.30.0  269  latest/edge  canonical✓  classic
root@k8s-perf-opt-m01:~# snap services k8s
Service                      Startup   Current   Notes
k8s.containerd               enabled   active    -
k8s.k8s-apiserver-proxy      disabled  inactive  -
k8s.k8s-dqlite               enabled   active    -
k8s.k8sd                     enabled   active    -
k8s.kube-apiserver           enabled   active    -
k8s.kube-controller-manager  enabled   active    -
k8s.kube-proxy               enabled   active    -
k8s.kube-scheduler           enabled   active    -
k8s.kubelet                  enabled   active    -
root@k8s-perf-opt-m01:~# k8s status
status: ready
high-availability: yes
datastore:
  type: k8s-dqlite
  voter-nodes:
    - 10.111.160.224:6400
    - 10.111.160.226:6400
    - 10.111.160.232:6400
  standby-nodes: none
  spare-nodes: none
network:
  enabled: true
dns:
  enabled: true
  cluster-domain: cluster.local
  service-ip: 10.152.183.100
  upstream-nameservers:
  - /etc/resolv.conf
ingress:
  enabled: true
  default-tls-secret: ""
  enable-proxy-protocol: false
load-balancer:
  enabled: true
  cidrs: []
  l2-mode: false
  l2-interfaces: []
  bgp-mode: false
  bgp-local-asn: 0
  bgp-peer-address: ""
  bgp-peer-asn: 0
  bgp-peer-port: 0
local-storage:
  enabled: false
  local-path: /var/snap/k8s/common/rawfile-storage
  reclaim-policy: Delete
  default: true
gateway:
  enabled: true

Can you suggest a fix?

Not at the moment, need to look at the logic

Are you interested in contributing with a fix?

Yes, I just need to know the guidelines :)

stevensbkang commented 6 months ago

Oh, and here are the k8sd logs:

root@k8s-perf-opt-m01:~# snap logs k8s.k8sd
2024-04-24T04:02:05Z k8s.k8sd[3446]: 2024/04/24 04:02:05 failed to watch configmap: watch closed
2024-04-24T04:30:35Z k8s.k8sd[3446]: No token exists yet. Creating a new token.
2024-04-24T04:33:16Z k8s.k8sd[3446]: 2024/04/24 04:33:16 failed to watch configmap: watch closed
2024-04-24T05:14:31Z k8s.k8sd[3446]: 2024/04/24 05:14:31 failed to watch configmap: watch closed
2024-04-24T06:05:16Z k8s.k8sd[3446]: 2024/04/24 06:05:16 failed to watch configmap: watch closed
2024-04-24T07:03:45Z k8s.k8sd[3446]: 2024/04/24 07:03:45 failed to watch configmap: watch closed
2024-04-24T07:34:58Z k8s.k8sd[3446]: 2024/04/24 07:34:58 failed to watch configmap: watch closed
2024-04-24T08:07:02Z k8s.k8sd[3446]: 2024/04/24 08:07:02 failed to watch configmap: watch closed
2024-04-24T08:48:46Z k8s.k8sd[3446]: 2024/04/24 08:48:46 failed to watch configmap: watch closed
2024-04-24T09:20:55Z k8s.k8sd[3446]: 2024/04/24 09:20:55 failed to watch configmap: watch closed
bschimke95 commented 6 months ago

Hi @stevensbkang Thanks for raising this. What kind of nodes are you running? VM, LXD container, ...?

Could you try to join the node under a different name?

# on cluster node
sudo k8s get-join-token myworkernode --worker
<token>
# on worker
sudo k8s join-cluster <token> --name myworkernode

By default, k8s join-cluster will take the hostname to join the cluster. I suspect there is a mismatch between the name set in k8s get-join-token and the one in k8s join-cluster.

stevensbkang commented 6 months ago

Ahh, gotcha... I thought the get-join-token expects one of the manager nodes, but in fact, it is the node name that is joining the cluster. It works as expected now, thanks so much!