Gradiant / 5g-charts

Helm charts for 5G Technologies
Apache License 2.0
108 stars 42 forks source link

UPF Pod has multiple interfaces, but Multus is not being used? - Info Discussion #150

Closed sam-sre closed 6 months ago

sam-sre commented 6 months ago

Hi,

Thanks for your efforts and the nice clean 5GC charts..

We are using your charts for testing 5GC CI/CD pipelines.. It caught my attention that your charts doesnt include Mutlus CRDs nor network-attachment-definitions, but still you have Pods with multiple interfaces like shown below.

My questions:

  1. If you are not using Multus/Macvlan for creating/configuring extra interfaces, how did you create the extra interfaces within UPF Pod? and UERANSIM-UE Pod?
  2. The ogstun secondary interface in UPF pod. Is this a TUN device? How did you create/configure it?
  3. Are seperating CP traffic from Data traffic ?
sam@ubuntu:~/vagrant/ansible$ kg po
NAME                                               READY   STATUS    RESTARTS        AGE
nfs-subdir-external-provisioner-5c47788dbf-kwllb   1/1     Running   0               16m
open5gs-amf-74867f4d79-rhgw2                       1/1     Running   0               11m
open5gs-ausf-6f7b99444d-6h55q                      1/1     Running   0               11m
open5gs-bsf-78cbddcf6d-mk85n                       1/1     Running   0               11m
open5gs-mongodb-76d8dfbbdb-gjxrw                   1/1     Running   0               11m
open5gs-nrf-5f58b84585-kzkc2                       1/1     Running   0               11m
open5gs-nssf-8676d9bc7b-7bpcv                      1/1     Running   0               11m
open5gs-pcf-655ff4967f-fsfnq                       1/1     Running   5 (9m14s ago)   11m
open5gs-populate-578dd7d4d8-r4fcl                  1/1     Running   0               11m
open5gs-smf-7f864d7d99-4bc8w                       1/1     Running   0               11m
open5gs-udm-7cf88b7c58-f4ljc                       1/1     Running   0               11m
open5gs-udr-6c96f5d447-4tkgv                       1/1     Running   3 (8m46s ago)   11m
open5gs-upf-fcc6fcd5c-snh4l                        1/1     Running   0               11m
open5gs-webui-567c65bc7b-5z9h7                     1/1     Running   0               11m
ueransim-gnb-7d8867667f-hhvjg                      1/1     Running   0               5m21s
ueransim-gnb-ues-6487c85db9-g6f8g                  1/1     Running   0               5m21s

Interfaces of the open5gs-upf-fcc6fcd5c-snh4l Pod

sam@ubuntu:~/vagrant/ansible$ k exec open5gs-upf-fcc6fcd5c-snh4l  -it -- /bin/bash
Defaulted container "open5gs-upf" out of: open5gs-upf, tun-create (init)
root@open5gs-upf-fcc6fcd5c-snh4l:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ogstun: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 500
    link/none 
    inet 10.45.0.1/16 scope global ogstun
       valid_lft forever preferred_lft forever
    inet6 fe80::9363:57e3:7202:6355/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever
23: eth0@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:6d:83:77:5c:9d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.1.41/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::686d:83ff:fe77:5c9d/64 scope link 
       valid_lft forever preferred_lft forever

Interfaces of the ueransim-gnb-ues-6487c85db9-g6f8g Pod

sam@ubuntu:~/vagrant/ansible$ k exec ueransim-gnb-ues-6487c85db9-g6f8g -it -- /bin/bash
bash-5.1# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: uesimtun0: <POINTOPOINT,PROMISC,NOTRAILERS,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet 10.45.0.2/32 scope global uesimtun0
       valid_lft forever preferred_lft forever
    inet6 fe80::14bd:6547:b7d2:65b5/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever
3: uesimtun1: <POINTOPOINT,PROMISC,NOTRAILERS,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet 10.45.0.3/32 scope global uesimtun1
       valid_lft forever preferred_lft forever
    inet6 fe80::f98c:f2cd:6ad7:fc6e/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 26:34:f9:56:dc:19 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.69/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2434:f9ff:fe56:dc19/64 scope link 
       valid_lft forever preferred_lft forever

Thanks.. Sam

cgiraldo commented 6 months ago

Hi, find your questions answered inline.

If you are not using Multus/Macvlan for creating/configuring extra interfaces, how did you create the extra interfaces within UPF Pod? and UERANSIM-UE Pod?

For open5gs-upf, we define an init container running a script from a configmap.

On the other hand, ueransim-ue process creates the tun interface, so we only have to give the pod NET_ADMIN capabilities -> https://github.com/Gradiant/openverso-charts/blob/a4a74be944cf2100a045fbbabea69efad85c0f13/charts/ueransim-gnb/templates/ues-deployment.yaml#L50

The ogstun secondary interface in UPF pod. Is this a TUN device? How did you create/configure it?

An init container creates the tun interface -> https://github.com/Gradiant/openverso-charts/blob/a4a74be944cf2100a045fbbabea69efad85c0f13/charts/open5gs-upf/templates/deployment.yaml#L65

The script that creates the tun interface is a configmap mounted as a volume. https://github.com/Gradiant/openverso-charts/blob/main/charts/open5gs-upf/resources/k8s-entrypoint.sh

Are seperating CP traffic from Data traffic ?

No they aren't.

sam-sre commented 6 months ago

@cgiraldo Thanks for the clarification..

Still not sure how the multiple interfaces are created without MULTUS.. Let's take the UPF Pod case; Even when running 2 containers in there (init container + upf container) ,, the Pod itself still has 2 interfaces, and each interface has a different IP Subnet.. When running the ip a command within the UPF Pod, we see:

Am I missing something? Does the initContainer script creates the ogstun as a secondary Pod interface?

sam@ubuntu:~/vagrant/ansible$ k exec open5gs-upf-fcc6fcd5c-snh4l  -it -- /bin/bash
Defaulted container "open5gs-upf" out of: open5gs-upf, tun-create (init)
root@open5gs-upf-fcc6fcd5c-snh4l:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ogstun: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 500
    link/none 
    inet 10.45.0.1/16 scope global ogstun
       valid_lft forever preferred_lft forever
    inet6 fe80::9363:57e3:7202:6355/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever
23: eth0@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:6d:83:77:5c:9d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.1.41/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::686d:83ff:fe77:5c9d/64 scope link 
       valid_lft forever preferred_lft forever
cgiraldo commented 6 months ago

Does the initContainer script creates the ogstun as a secondary Pod interface?

Yes.

Have you check the initContainer script? The exact line where the ogstun interface is created is:

https://github.com/Gradiant/openverso-charts/blob/a4a74be944cf2100a045fbbabea69efad85c0f13/charts/open5gs-upf/resources/k8s-entrypoint.sh#L14

avrodriguezgrad commented 6 months ago

Hi @sam-sre

I think you are asking about this. First, you have to deploy Multus in your cluster, and apply a network attachment in the namespace you're using.

When you've done this, you have to reference the network attachment in values.yaml of the chart. You can see in the UPF chart that:

If you add there the annotation corresponding with your network attachment, you would have a secondary interface inside the UPF pod with the name "net1", so the next step is changing the dev interface of the gtpu in this line (https://github.com/Gradiant/openverso-charts/blob/a4a74be944cf2100a045fbbabea69efad85c0f13/charts/open5gs-upf/values.yaml#L83)

After that, the ogstun will be linked with the net1 interface added by Multus, but you'll have to deal with routes problems. I suggest you to use an additional script to override routes or to use another plugin like https://github.com/openshift/route-override-cni

Feel free to ask if you have additional doubts! Cheers, Álvaro

sam-sre commented 6 months ago

@cgiraldo Thanks for your reply. As I do understand what the initContainer script does (creating an interface) ...It is the first time I see an initContainer used to plump a secondary Pod interface.. This concept is new to me although I did see many commercial 5GC cloudified deployments.. Your implementation demolish the need to have Multus at all 😅 and now I'm wondering why would someone use Multus if an initContainer can do the job..

sam-sre commented 6 months ago

@avrodriguezgrad Many thanks for your detailed clarification .. I was searching for how to bind the ogstun interface to the net1 interface ,, I'll give it a try in my testing environment..

BR Anas

avrodriguezgrad commented 6 months ago

You're welcome @sam-sre !

Feel free to reopen the issue if you need anything else!

BR, Álvaro

sam-sre commented 6 months ago

Hi @avrodriguezgrad

After figuring out the appropriate Multus configuration and IPAM options to have proper routing between K8s nodes, the Helm charts are compalining but I'm not sure how to tackle it.

Helm Complain: Error: INSTALLATION FAILED: template: open5gs/charts/upf/templates/configmap.yaml:11:3: executing "open5gs/charts/upf/templates/configmap.yaml" at <tpl (.Files.Get "resources/config/upf.yaml") .>: error calling tpl: error during tpl function execution for "{{ $open5gsName := .Release.Name }}\n\nlogger:\n level: {{ .Values.config.logLevel }}\nparameter: {}\n\nupf:\n pfcp:\n - dev: \"eth0\"\n port: {{ .Values.containerPorts.pfcp }}\n gtpu:\n - dev: {{ default \"eth0\" .Values.config.upf.gtpu.dev }}\n port: {{ .Values.containerPorts.gtpu }}\n {{- if .Values.config.upf.gtpu.advertise }}\n advertise: \"{{ tpl .Values.config.upf.gtpu.advertise }}\"\n {{- end }}\n subnet:\n {{- range .Values.config.subnetList }}\n - {{- omit . \"createDev\" \"enableNAT\" | toYaml | nindent 6 }}\n {{- end }}\n\nsmf:\n pfcp:\n - name: {{ default (printf \"%s-smf-pfcp\" $open5gsName) .Values.config.smf.pfcp.hostname }}\n port: {{ default 8805 .Values.config.smf.pfcp.port }}\n": template: gotpl:15:21: executing "gotpl" at <tpl>: wrong number of args for tpl: want 2 got 1

Here are some info about my testing env:

1- I'm using 2 Vagrant Boxes (each with 2 interfaces) as K8s nodes( 1 Control-Plane, 1 Worker) Interfaces of the Control-Plane node:

vagrant@kmaster:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:c1:b5:6f brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
    inet 192.168.121.90/24 metric 100 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 2602sec preferred_lft 2602sec
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b9:3d:df brd ff:ff:ff:ff:ff:ff
    altname enp0s6
    altname ens6
    inet 172.16.16.100/24 brd 172.16.16.255 scope global eth1
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:84:f1:ab brd ff:ff:ff:ff:ff:ff
    altname enp0s7
    altname ens7
    inet 10.45.0.10/24 brd 10.45.0.255 scope global eth2
       valid_lft forever preferred_lft forever

Interfaces of the worker node:

vagrant@kworker1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:66:15:42 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
    inet 192.168.121.112/24 metric 100 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 2650sec preferred_lft 2650sec
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:bf:2c:62 brd ff:ff:ff:ff:ff:ff
    altname enp0s6
    altname ens6
    inet 172.16.16.101/24 brd 172.16.16.255 scope global eth1
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:9e:d1:7d brd ff:ff:ff:ff:ff:ff
    altname enp0s7
    altname ens7
    inet 10.45.0.11/24 brd 10.45.0.255 scope global eth2
       valid_lft forever preferred_lft forever

2- nad.yaml used:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "eth2",
      "mode": "bridge",
      "ipam": {
        "type": "static"
      }
    }'

3- Within values.yml, this is the upf values (eth2 is the secondary interface name on both vagrant boxes):

upf:
  podAnnotations: { k8s.v1.cni.cncf.io/networks: '[ { "name": "macvlan-conf","ips": [ "10.45.0.1/24" ] } ]' }
  config:
    upf:
      gtpu:
        dev: "eth2"
        advertise: "10.45.0.1"
avrodriguezgrad commented 5 months ago

Hi @sam-sre

I don't know exactly how to help you with this, but I have one question. Why do you have interfaces in 10.45.0.0/24 subnet in the VM?

I guess you've misunderstood about the behaviour of the chart, so I'm gonna try to explain it now.

Interfaces IP could be whatever except 10.45.0.0/24 subnet (open5gs UP subnet), in your case we would use eth0 for example. So, NAD has to point to eth0, and I've work having the same subnet for the NAD interface (net1 inside the POD) and eth0. So net1 is going to have a IP in the 192.168.121.0/24 subnet.

Then, when the chart is initialized, it creates inside the POD multiples ogstun interfaces as you declare them in values.yaml. For our case, we only have one and it's ogstun. The initialization gives 10.45.0.1 to the ogstun, as this interface is the gateway of the 5G network (UPF).

So, and the last point, you have to indicate in the upf.yaml the dev you want to link with the ogstun interface, in the config.upf.gtpu.dev field, and there you have to put "net1", as it's the secondary interface you have attached to the POD using Multus. Here, what I believe is open5gs is the responsible of linking both interfaces to route the traffic between ogstun and net1, and Multus is the responsible of routing the traffic between net1 and eth0.

I don't know if this is going to help you, but feel free to comment again and I would see if I can help you.

BR, Álvaro

sam-sre commented 5 months ago

Hi @avrodriguezgrad

Thanks for your replies ,, it always help to understand more ^^

I don't know exactly how to help you with this, but I have one question. Why do you have interfaces in 10.45.0.0/24 subnet in the VM?

I created those extra interfaces on each VM to connect to an external testing components that I wanted to isolate from other traffic, but you are correct, I should've used another subnet for that.

I guess you've misunderstood about the behaviour of the chart, so I'm gonna try to explain it now.

You are correct, as I'm not a Helm Guru :sweat_smile: , I tried to follow the chain of reaction when it comes to UPF charts. To understand the Error that Helm throw,, I did a couple of hours analysis of how the UPF Templates are calling each other and how the Values are rendered, and the best I could find out that, the below tpl should returen more values that it did in my case. https://github.com/Gradiant/openverso-charts/blob/a4a74be944cf2100a045fbbabea69efad85c0f13/charts/open5gs-upf/templates/configmap.yaml#L11

And this is due to the below gtpudoes not have the correct values it expected (or this is what I think) https://github.com/Gradiant/openverso-charts/blob/a4a74be944cf2100a045fbbabea69efad85c0f13/charts/open5gs-upf/resources/config/upf.yaml#L11

I implemented the edits that you suggested within my nad.yml and values.yml but I still get the same Helm error: helm install open5gs openverso/open5gs --version 2.0.8 --values 5g_values.yml Error: INSTALLATION FAILED: template: open5gs/charts/upf/templates/configmap.yaml:11:3: executing "open5gs/charts/upf/templates/configmap.yaml" at <tpl (.Files.Get "resources/config/upf.yaml") .>: error calling tpl: error during tpl function execution for "{{ $open5gsName := .Release.Name }}\n\nlogger:\n level: {{ .Values.config.logLevel }}\nparameter: {}\n\nupf:\n pfcp:\n - dev: \"eth0\"\n port: {{ .Values.containerPorts.pfcp }}\n gtpu:\n - dev: {{ default \"eth0\" .Values.config.upf.gtpu.dev }}\n port: {{ .Values.containerPorts.gtpu }}\n {{- if .Values.config.upf.gtpu.advertise }}\n advertise: \"{{ tpl .Values.config.upf.gtpu.advertise }}\"\n {{- end }}\n subnet:\n {{- range .Values.config.subnetList }}\n - {{- omit . \"createDev\" \"enableNAT\" | toYaml | nindent 6 }}\n {{- end }}\n\nsmf:\n pfcp:\n - name: {{ default (printf \"%s-smf-pfcp\" $open5gsName) .Values.config.smf.pfcp.hostname }}\n port: {{ default 8805 .Values.config.smf.pfcp.port }}\n": template: gotpl:15:21: executing "gotpl" at <tpl>: wrong number of args for tpl: want 2 got 1

The Helm Error is gone only when I deleted the advertise: "10.10.0.13" line from my upf values.ymlfile. But now the UPF pod is always stuck in the init phase with the error:

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               4m7s  default-scheduler  Successfully assigned default/open5gs-upf-78b9484868-l8bc8 to kmaster
  Warning  FailedCreatePodSandBox  7s    kubelet            Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded

My nad.yml

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "eth1",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "10.10.0.11/24",
        "rangeStart": "10.10.0.12",
        "rangeEnd": "10.10.0.50"
      }
    }'

My UPF values.yml

upf:
  podAnnotations: { k8s.v1.cni.cncf.io/networks: '[ { "name": "macvlan-conf","ips": [ "10.10.0.13/24" ] } ]' }
  config:
    upf:
      gtpu:
        dev: "net1"
        advertise: "10.10.0.13" #Commented out to satisfy Helm

Helm didnt like passing 2 values via UPF values.yml ; dev & advertise but it was satisfied with passing only the dev value.

I'll keep troubleshooting .. Thanks for the help ^^

BR Sami

sam-sre commented 5 months ago

Update about UPF Pod error:

The Helm Error is gone only when I deleted the advertise: "10.10.0.13" line from my upf values.yml file. But now the UPF pod is always stuck in the init phase with the error

This happens due to Multus taking the wrong numbering in the /etc/cni/net file when using Calico.. When numbering is fixed, the Pods is running..