Orange-OpenSource / towards5gs-helm

Helm charts for deploying 5G network services on Kubernetes
Other
170 stars 131 forks source link

NRF pod stuck in "Init" state, awaiting for MongoDB-Where MongoDB pod / containers / svc are in "Running" state. #4

Closed lpureenaece closed 2 years ago

lpureenaece commented 3 years ago

Description:

  1. Created a k8s cluster (cluster is up and running).
  2. Kubernetes worker & master node on kernel 5.4.0-42-generic.
  3. Add an additional "eth1" interface on worker node.
  4. Installed Multus & helm.
  5. Created a persistent volume.
  6. Execute the command- "helm -n free5gc-core install --generate-name ./free5gc/"
  7. After that except upf & mongo-db all other pods are stuck at "Init" state.
  8. All the nodes are in same namespace "kube-system".

Logs are- cmd- "kubectl -n kube-system get pods --all-namespaces" image

cmd- "kubectl describe pod free5gc-1629270501-nrf-694fd8cdd6-cxqvv -n kube-system" nrf_describe_log

cmd- "kubectl get pvc,pv,svc --all-namespaces -o wide" image

cmd- "kubectl get network-Attachment-definitions --all-namespaces" image

Please assist me.

ebucchianeri commented 3 years ago

Hi, I had a similar issue. I don't know if it can be helpful for you but, in my case, it was some problem in the DNS resolution, it was solved restarting coredns.

raoufkh commented 3 years ago

Hello!

Is this issue different from this one?

raoufkh commented 2 years ago

No more activity!

lpureenaece commented 2 years ago

Hi, I had a similar issue. I don't know if it can be helpful for you but, in my case, it was some problem in the DNS resolution, it was solved restarting coredns.

I have restarted the coredns by command "kubectl -n kube-system rollout restart deployment coredns" but still same issue.

lpureenaece commented 2 years ago

Hello!

Is this issue different from this one?

This is the same issue, but in previous issue i have created the free5gc and coredns pods in two different namespaces that's why i was facing that issue, when I created both in same namespace the issue has gone.

But at present both are in same namespace still facing the pods init state issue.

image

image

image

image

image

image

ebucchianeri commented 2 years ago

NRF has not started due to its init-container which seems to be failing. The NRF init-container basically tries to connect to MongoDB, doing a nc. Is the mongoDB reachable using the service name?

raoufkh commented 2 years ago

Hi @lpureenaece

The way the NRF's init container checks for MongoDB readiness has not changed. That means that you are probably encountering DNS problems on your cluster. As @ebucchianeri mentionned, you should try to reach mongodb using its service name from another Pod on the cluster. You can use the busybox image to do it.

Bests

lpureenaece commented 2 years ago

NRF has not started due to its init-container which seems to be failing. The NRF init-container basically tries to connect to MongoDB, doing a nc. Is the mongoDB reachable using the service name?

Thanks for the reply @ebucchianeri , Could you please let me know what would be the command to check this mongoDB reachability check using service name?

raoufkh commented 2 years ago

First, can you share the logs of the init container please? kubectl -n <your-namespace> logs <nrf-pod-name> -c wait-mongo

Then, for debugging, you can run a busybox Pod on the same namespace as MongoDB and then run this command nslookup <mongo-service-name> inside the Pod. More reading [here]

lpureenaece commented 2 years ago

kubectl -n logs -c wait-mongo

image

ebucchianeri commented 2 years ago

Seems a problem in service name resolution, I had the same issue and solved restarting coredns, but since this solution does not work in your case maybe you could check the status or log of coredns pods

lpureenaece commented 2 years ago

I have deployed two different kubernetes cluster, one with ntw Flannel & another with Calico, Cluster with flannel ntw free5gc pods are running fine but on the cluster with Calico ntw I am getting this init issue. Have you any idea about it, Please confirm.

ebucchianeri commented 2 years ago

I don't think this is due to different CNI plugins. I am using Calico too, without any problems.

raoufkh commented 2 years ago

The idea is to try a simple application deployment on the cluster where it is malfunctioning and debug the DNS names resolution. https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

Don't hesitate to re-open if you have more information.

Raouf