crc-org / snc

Single Node Cluster creation scripts for OpenShift 4.x as used by CodeReady Containers
https://crc.dev
Apache License 2.0
100 stars 51 forks source link

kubelet and crio services not starting properly starting 4.16.0 rcX #900

Closed adrianriobo closed 3 months ago

adrianriobo commented 3 months ago

Testing rc (tested on rc1 and rc3) versions for OCP 4.16.0 the cluster is not started properly

...
Location: https://quay.io:443/

INFO Check DNS query from host...                 
DEBU api.crc.testing resolved to [192.168.130.11] 
DEBU foo.apps-crc.testing resolved to [192.168.130.11] 
INFO Verifying validity of the kubelet certificates... 
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds 
DEBU SSH command results: err: <nil>, output: 2025-06-06T11:00:51+00:00 
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate | cut -d= -f 2)" --iso-8601=seconds 
DEBU SSH command results: err: <nil>, output: 2025-06-06T11:01:26+00:00 
DEBU Running SSH command: date --date="$(sudo openssl x509 -in /etc/kubernetes/static-pod-resources/kube-apiserver-certs/configmaps/aggregator-client-ca/ca-bundle.crt -noout -enddate | cut -d= -f 2)" --iso-8601=seconds 
DEBU SSH command results: err: <nil>, output: 2024-06-12T13:03:40+00:00 
INFO Starting kubelet service                     
DEBU Using root access: Executing systemctl daemon-reload command 
DEBU Running SSH command: sudo systemctl daemon-reload 
DEBU SSH command results: err: <nil>, output:     
DEBU Using root access: Executing systemctl start kubelet 
DEBU Running SSH command: sudo systemctl start kubelet 

Tracking down the issue we see from inside the VM crio and kubelet services are not starting properly

[core@crc ~]$ sudo systemctl status kubelet
○ kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; disabled; preset: disabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─01-kubens.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf, 80-nodeip.conf
     Active: inactive (dead)
[core@crc ~]$ sudo systemctl -l status crio
○ crio.service - Container Runtime Interface for OCI (CRI-O)
     Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled)
    Drop-In: /etc/systemd/system/crio.service.d
             └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
     Active: inactive (dead)
       Docs: https://github.com/cri-o/cri-o

Checking the content for the pullsecret it is empty:

[core@crc ~]$ sudo cat /var/lib/kubelet/config.json
{}
gbraad commented 3 months ago

if we apply the pull-secret at a later stage, will it work as expected?

adrianriobo commented 3 months ago

actually the user pullsecret is set at a later stage, so maybe I set the wrong title to the issue as the real problem is coming from kubelet service not starting.

I though it was caused by the missing pullsecret but maybe it is not (and at that stage the pullsecret is and should be empty)

gbraad commented 3 months ago

What are the requirements for the kubelet to start correctly. There might be a conflict in what it expects and what was given...

adrianriobo commented 3 months ago

yeah I tried to get journal logs but there are no entries and I could not get more info from systemctl status, so I will wait til Monday and catchup with @praveenkumar

praveenkumar commented 3 months ago

It is happening because of ovn-configure service https://github.com/openshift/machine-config-operator/commit/2003da5d02f27f8dd0e5a91fb2c714a247c4e824 and because of that now the network connection are used from /run/NetworkManager/system-connections then /etc/NetworkManager/system-connections. So in the crc side when we add the dns and search to the network it doesn't land to /run/NetworkManger/system-connections and this service fails.

Before adding the dns/search option to ovs-if-br-ex

$ nmcli -f TYPE,FILENAME,NAME connection
TYPE           FILENAME                                                                                               NAME               
ovs-interface  /run/NetworkManager/system-connections/ovs-if-br-ex.nmconnection                                       ovs-if-br-ex       
dummy          /etc/NetworkManager/system-connections/internalEtcd.nmconnection                                       internalEtcd       
ovs-bridge     /run/NetworkManager/system-connections/br-ex.nmconnection                                              br-ex              
ethernet       /run/NetworkManager/system-connections/ovs-if-phys0.nmconnection                                       ovs-if-phys0       
ovs-port       /run/NetworkManager/system-connections/ovs-port-br-ex.nmconnection                                     ovs-port-br-ex     
ovs-port       /run/NetworkManager/system-connections/ovs-port-phys0.nmconnection                                     ovs-port-phys0     
loopback       /run/NetworkManager/system-connections/lo.nmconnection                                                 lo                 
ethernet       /run/NetworkManager/system-connections/Wired connection 1.nmconnection                                 Wired connection 1 
dummy          /etc/NetworkManager/system-connections/internalEtcd-ab5494ad-b1ed-4d39-ae63-c7e024f1dc2f.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-d718218d-672e-41f5-84ab-43df7113bede.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-d0ed1e52-6d5a-4b89-9765-76523578e7d9.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-d2f1cd73-395c-465e-9f0a-67edc581ae37.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-4d96fdb2-d185-4a48-8e3a-242e697e8fc8.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-cce6de6c-fae4-4941-9407-cf4fb8ddf804.nmconnection  internalEtcd       

After adding the dns/search option to ovs-if-br-ex

$ sudo nmcli connection modify ovs-if-br-ex ipv4.dns 198.168.130.1 ipv4.dns-search crc.testing

$ nmcli -f TYPE,FILENAME,NAME connection
TYPE           FILENAME                                                                                               NAME               
ovs-interface  /etc/NetworkManager/system-connections/ovs-if-br-ex-20c7179a-202f-4dbb-adad-ac7034eac52d.nmconnection  ovs-if-br-ex       
dummy          /etc/NetworkManager/system-connections/internalEtcd.nmconnection                                       internalEtcd       
ovs-bridge     /run/NetworkManager/system-connections/br-ex.nmconnection                                              br-ex              
ethernet       /run/NetworkManager/system-connections/ovs-if-phys0.nmconnection                                       ovs-if-phys0       
ovs-port       /run/NetworkManager/system-connections/ovs-port-br-ex.nmconnection                                     ovs-port-br-ex     
ovs-port       /run/NetworkManager/system-connections/ovs-port-phys0.nmconnection                                     ovs-port-phys0     
loopback       /run/NetworkManager/system-connections/lo.nmconnection                                                 lo                 
ethernet       /run/NetworkManager/system-connections/Wired connection 1.nmconnection                                 Wired connection 1 
dummy          /etc/NetworkManager/system-connections/internalEtcd-ab5494ad-b1ed-4d39-ae63-c7e024f1dc2f.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-d718218d-672e-41f5-84ab-43df7113bede.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-d0ed1e52-6d5a-4b89-9765-76523578e7d9.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-d2f1cd73-395c-465e-9f0a-67edc581ae37.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-4d96fdb2-d185-4a48-8e3a-242e697e8fc8.nmconnection  internalEtcd       
dummy          /etc/NetworkManager/system-connections/internalEtcd-769c77a9-f65c-4858-806f-fd59e9e82748.nmconnection  internalEtcd       
praveenkumar commented 3 months ago

https://github.com/crc-org/crc/pull/4222 should fix it.

adrianriobo commented 3 months ago

Fixed by https://github.com/crc-org/crc/pull/4222