CARV-ICS-FORTH / kubernetes-riscv64

Status of work on running Kubernetes on RISC-V
26 stars 2 forks source link

Question about strange behaviour #5

Open matsbror opened 5 days ago

matsbror commented 5 days ago

I am setting up a heterogeneous cluster with a number of ARM64 nodes, one RISCV64 node and one or more x86-64-nodes.

I first tried with one of the ARM-nodes as the controller. I could set up the other nodes, but I could not get the pod to start on the riscv node (I am using the hello-kubernetes example and using nodeSelector: kubernetes.io/arch: riscv64). The pod was assigned to the right node but I got the following error which I could see in journalctl:

kubelet Error: services have not yet been read at least once, cannot construct envvars

When I used arm64 as desired node, it worked fine. The ARM nodes were installed from the official k3s page.

I then switch roles and put the controller on the riscv node and made the ARM nodes agents. With this configuration I could run hello-kubernetes fine both on ARM nodes as well as on the RISCV node.

I can live with this, but I'd rather have the RISCV node as an agent.

Any ideas on why I have experienced this behaviour?

chazapis commented 5 days ago

Hi! It seems that this error usually shows up when the underlying network has issues. Have you changed anything at the networking side when changing the controller architecture? What CNI are you using? Are all nodes running K3s?

matsbror commented 4 days ago

Nothing was changed on the network side and all nodes are running k3s. The arm nodes directly from the k3s.io site and the riscv node from this repo.

I am new to Kubernetes so I do not yet know what a CNI is and I certainly have not configured one. I just used the getting started instructions on k3s.io.

chazapis commented 1 day ago

Ok. I have made a note to try and duplicate this on my side. I will do and get back to you when I have some progress.

matsbror commented 19 hours ago

I had my IT-people to make sure that the traffic between my nodes is open on all ports.

I replicated the issue but with an x86 machine as the controller. I think I will need to revert having the riscv node as controller.