Some pods do not start after delete and install of UERANSIM

pinoOgni commented 1 year ago

Hi folks, I hope you can help me solve this problem.

I have a cluster created with kubeadm with 2 physical nodes called cube2 and cube4. More info:

Ubuntu 18.04.5 LTS
Kernel version 5.8.5-050805-generic
Container Runtime version: containerd://1.5.5
Kubectl Version: v1.23.3
K8s Version: v1.23.8
Calico version: v3.23.2 (installed with IP forwarding)
Multus CNI version: 3.9
The nodes interfaces are in promiscuous mode and their name is e0 (with the altname property)

Everything works perfectly, both the free5gc and ueransim pods are correctly deployed, the uesimtun0 interface is created and I can contact the internet from the UE. Here a screeshot:

work

Now before exposing the problem(s) I would like to point out that right now I am using a local version of the repo that refers to this commit because in that version everything worked fine (some weeks ago) and I was thinking of solving the problem using that version but I was wrong.

The first problem occurs when I run the following commands:

helm delete -n 5g ueransim
# I wait for the pods to be terminated and then
helm -n 5g install ueransim ./towards5gs-helm/charts/ueransim/ --set global.n2network.masterIf=e0,global.n3network.masterIf=e0

After some time, this is the situation: not_work_first

Note that the PODs in unknown state are amf, smf and upf and the gnb is in pending. The only thing these PODs have in common is that they use MACVLANs and have multiple interfaces configured with Multus.

Since the previous situation did not change, I ran these commands:

helm delete -n 5g ueransim
helm delete -n 5g free5gc
# I wait for the pods to be terminated and then
helm -n 5g install free5gc ./towards5gs-helm/charts/free5gc --set global.n2network.masterIf=e0,global.n3network.masterIf=e0,global.n4network.masterIf=e0,global.n6network.masterIf=e0,global.n9network.masterIf=e0,global.n6network.subnetIP=<subnetIP>,global.n6network.cidr=<cidr>,global.n6network.gatewayIP=<gatewayIP>,free5gc-upf.upf.n6if.ipAddress=<fakeAdressIP>
helm -n 5g install ueransim ./towards5gs-helm/charts/ueransim/ --set global.n2network.masterIf=e0,global.n3network.masterIf=e0

After some time, this is the situation:

not_work_end

Note that the status of the amf, smf and upf pods was the same even before the UERANSIM installation.

I also upload the "kubectl describe pod " of upf. The same situation can be seen from the logs of: amf, smf and gnb. Again as before, the only thing these PODs have in common is that they use MACVLANs and have multiple interfaces configured with Multus.

upf

Do you have any suggestions? Thanks in advance

EDIT: I'm trying to understand the error better and maybe it's related to Multus and MACVLAN plugin.

raoufkh commented 1 year ago

Hi

Thank you first for the clear explanation.

Can you share the result of this command kubectl -n <your-namespace> get network-attachment-definitions at each step (i.e. 1) after installing Free5GC and UERANSIM 2) after deleting UERANSIM 3) after reinstalling UERANSIM, etc.

Raouf

pinoOgni commented 1 year ago

Hi Raouf, thanks for the response, here it is:

after installing Free5GC and UERANSIM
after deleting UERANSIM
after reinstalling UERANSIM

I have also noticed that the following pods go into Unknown state but then restart and continue to function normally.

other_pods

after deleting UERANSIM and Free5GC

Anyway, immediately after writing this issue, I had found a trick to overcome the problem. In practice since this sequence of steps does not work:

helm delete -n 5g ueransim
helm delete -n 5g free5gc
# I wait for the pods to be terminated and then
helm -n 5g install free5gc ./towards5gs-helm/charts/free5gc --set global.n2network.masterIf=e0,global.n3network.masterIf=e0,global.n4network.masterIf=e0,global.n6network.masterIf=e0,global.n9network.masterIf=e0,global.n6network.subnetIP=<subnetIP>,global.n6network.cidr=<cidr>,global.n6network.gatewayIP=<gatewayIP>,free5gc-upf.upf.n6if.ipAddress=<fakeAdressIP>
helm -n 5g install ueransim ./towards5gs-helm/charts/ueransim/ --set global.n2network.masterIf=e0,global.n3network.masterIf=e0

Just reverse the cancellation order like this:

helm delete -n 5g free5gc
helm delete -n 5g ueransim
# I wait for the pods to be terminated and then
helm -n 5g install free5gc ./towards5gs-helm/charts/free5gc --set global.n2network.masterIf=e0,global.n3network.masterIf=e0,global.n4network.masterIf=e0,global.n6network.masterIf=e0,global.n9network.masterIf=e0,global.n6network.subnetIP=<subnetIP>,global.n6network.cidr=<cidr>,global.n6network.gatewayIP=<gatewayIP>,free5gc-upf.upf.n6if.ipAddress=<fakeAdressIP>
helm -n 5g install ueransim ./towards5gs-helm/charts/ueransim/ --set global.n2network.masterIf=e0,global.n3network.masterIf=e0

Giuseppe

raoufkh commented 1 year ago

I think there is a problem of Calico and Multus working together in your cluster. There are problems in the management of additional interfaces. I think it would be useful to publish an issue on the rest of calico and multus to find a more sophisticated solution because with flannel+multus I can uninstall and reinstall without problems.

Raouf

pinoOgni commented 1 year ago

Yes I think you are right. I'll do it, thanks anyway for the help.

raoufkh commented 1 year ago

Hello @pinoOgni

Did you manage to make it work?

Bests Abderaouf

pinoOgni commented 1 year ago

Hi @raoufkh not yet, the only trick I could find is what I wrote. I still think it's related to the macvlan cni plugin which is downloaded when you install the container runtime (in my case containerd). I don't know if it depends on the version installed or on some configuration between the CNI used and Multus.

Note that I don't know if it's related: I've had a similar error happen to me in some cases. When I install the Core, the amf, smf pods stay in the Init phase, while the upf is running but if I try to contact it from another pod it doesn't work and also it has no internet connection (stupidly i can't do apt update from inside the upf). The only thing they have in common is the MACVLAN and as a simple test I deleted the cni plugin in /opt/cni/bin of that node and put that of a node where I didn't have this problem and magically after a new installation all pods started working. I'll try to investigate a little better if I can.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

Orange-OpenSource / towards5gs-helm

Some pods do not start after delete and install of UERANSIM #55