Getting started - context deadline exceeded

VariableDeclared commented 7 months ago

Please describe the question or issue you're facing with "Getting started - Canonical Kubernetes documentation".

Hello,

Following the Getting Started on an environment connected to the internet, bootstrap fails to create the worker node and pods remain in pending:

root@k8s-test:~# snap install k8s --edge --classic
k8s (edge) v1.29.3 from Canonical✓ installed
root@k8s-test:~# sudo k8s bootstrap
Bootstrapping the cluster. This may take a few seconds, please wait.
Bootstrapped a new Kubernetes cluster with node address "192.168.3.36:6400".
The node will be 'Ready' to host workloads after the CNI is deployed successfully.

root@k8s-test:~# sudo k8s status
status: not ready
high-availability: no
datastore:
  type: k8s-dqlite
  voter-nodes:
    - 192.168.3.36:6400
  standby-nodes: none
  spare-nodes: none
network:
  enabled: true
dns:
  enabled: true
  cluster-domain: cluster.local
  service-ip: 10.152.183.160
  upstream-nameservers:
  - /etc/resolv.conf
ingress:
  enabled: false
  default-tls-secret: ""
  enable-proxy-protocol: false
load-balancer:
  enabled: false
  cidrs: []
  l2-mode: false
  l2-interfaces: []
  bgp-mode: false
  bgp-local-asn: 0
  bgp-peer-address: ""
  bgp-peer-asn: 0
  bgp-peer-port: 0
local-storage:
  enabled: false
  local-path: /var/snap/k8s/common/rawfile-storage
  reclaim-policy: Delete
  set-default: true
gateway:
  enabled: true
metrics-server:
  enabled: true

root@k8s-test:~# sudo k8s kubectl get pods -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cilium-operator-5f76fdbf9c-kbllv   0/1     Pending   0          6s
coredns-66579b5b88-mmxzh           0/1     Pending   0          4s
metrics-server-57db9dfb7b-r5mls    0/1     Pending   0          6s
root@k8s-test:~# sudo k8s kubectl get nodes
No resources found
root@k8s-test:~# sudo k8s kubectl get nodes -A
No resources found
root@k8s-test:~# sudo k8s kubectl get nodes -A
No resources found

Could this be due to me running as root? Is this a known limitation?

Thank you, Peter

Reported from: https://documentation.ubuntu.com/canonical-kubernetes/latest/tutorial/getting-started/

VariableDeclared commented 7 months ago

Also getting context deadline exceeded on a VM with 8GiB RAM, 100G disk, 4 vCPUs, with ubuntu user on two different VMs:

ubuntu@k8s-test:~$ sudo k8s bootstrap
Bootstrapping the cluster. This may take a few seconds, please wait.
Error: Failed to bootstrap the cluster.

The error was: failed to bootstrap new cluster using POST /k8sd/cluster: failed to bootstrap new cluster: Post "http://control.socket/cluster/control": context deadline exceeded

Removing lxc constraint

bschimke95 commented 6 months ago

Hey Peter,

Thanks for raising this. The VM specs should be fine. You could try to extend the timeout

sudo k8s bootstrap --timeout 10m

but I think the problem is on our side.

Could you add the output of

journalctl -f --lines 2000

(We are working on an inspect script this pulse which automates the collection of debug info)

VariableDeclared commented 6 months ago

hello @bschimke95 ! indeed after using the 10m timeout, bootstrap now passed on one of the nodes. Is there an NVMe requirement on the bootstrap? These VMs are spinner based

bschimke95 commented 6 months ago

Hey Peter,

Not in particular. Also, the suggestion that I provided might just worked by luck. We see this issue as well in #321 and #277. This basically happens because of a internal timeout of microcluster that we cannot work around yet. It is addressed in https://github.com/canonical/microcluster/pull/105.

On our side, we also work towards reducing the overall time that the commands need to finish which eventually also "fixes" this issue. A first effort is done in #339 with a follow-up PR coming soon to make those commands even faster by moving the last pieces into an async approach.

I will close this issue in favour of #321.

canonical / k8s-snap

Getting started - context deadline exceeded #317