Orange-OpenSource / towards5gs-helm

Helm charts for deploying 5G network services on Kubernetes
Other
167 stars 130 forks source link

Pods stuck in Init state, created container wait-nrf #13

Closed ritokispingvin closed 2 years ago

ritokispingvin commented 2 years ago

Hi, after installing the free-5gc project with helm, some of my pods are stucking in Init state and even after waiting minutes they don't come up. Sometimes only 2 hangs but sometimes 4-5 pods are hanging. I saw that yesterday some update was made probably about this issue: "Fix initContainer curl command waiting for NRF ready / Add --insecure…" but I'm still experiencing the issue. Can you please help me how to overcome this issue?

image image

Thank you and best regards, ritokispingvin

raoufkh commented 2 years ago

Hello!

Can you provide following information please?

Regards, Abderaouf

ritokispingvin commented 2 years ago

Hi,

please find the requested logs below. 192.168.56.103 is my master nodes, .102 and .101 are my worker ones. image Looks like the problem is with the pods running on .102.

ubuntu@ubuntu:~$ kubectl -n pvolume logs v3.0.6-free5gc-pcf-pcf-646cc9b75f-gmpsd -c wait-nrf

Best Regards, ritokispingvin

raoufkh commented 2 years ago

Other Pods than UPF, mongo and webui wait for the NRF to be ready in the init phase. It seems like only Pods which are scheduled on another node than the one where NRF is scheduled stuck in init state. Are you sure that in your cluster, communications between Pods on different worker nodes is possible?

If it is a cluster for testing, can you trie draining and then removing the .102 node from the cluster to check?

Another option to check is to set the nodeSelector field to the .101 node labels on all deployments.

NOTE: all our Helm charts provide the possibility to customize this field (e.g. free5gc-amf.amf.nodeSelector if you want to do it for AMF from the main Helm chart), but it will take a little bit more time than the first approach.

Regards, Abderaouf

ritokispingvin commented 2 years ago

I'm able to SSH from all node into all other nodes without password so connectivity should be ok. After customizing the nodeSelector field to use .101 all pods are running now. Ticket can be closed, many thanks for the tips!

Best Regards, ritokispingvin

raoufkh commented 2 years ago

Great! However, scheduling all pods on the same node is a troubleshooting solution. The ideal solution is to fix the connectivity problem between Pods that are on different nodes. This can be related to DNS problems or the CNI used.