Gradiant / 5g-charts

Helm charts for 5G Technologies
Apache License 2.0
117 stars 46 forks source link

[Bug]: ogstun interface not forwaring UE traffic of UPF pod #178

Closed Thanasislt closed 3 months ago

Thanasislt commented 4 months ago

Steps to reproduce

We have set up Open5GS in our Kubernetes cluster, but we are encountering an issue where the UPF pod is unable to ping from the ogstun interface to the internet. When attempting to ping through interface eth0 traffic is forwarded with no issues and ping works as expected. We have followed the documentation and setup guides, but the connectivity issue persists. Please note that in our case we increased the O5GS version from 2.7.0 to 2.7.1, though this caused no crashes to the pods. We have validated the pod interfaces through the ip command. Environment:

Open5gs Version: v2.7.1
Kubernetes cluster: Vanilla kubernetes (v1.28) with 3 nodes
CNI: Cilium
Storage: Longhorn
OS: Ubuntu 22.04 LTS (kernel 6.5)

ip a output:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: ogstun: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1350 qdisc fq_codel state UP group default qlen 500
    link/none 
    inet 10.45.0.1/16 scope global ogstun
       valid_lft forever preferred_lft forever
    inet6 fe80::5170:8c1a:e659:250e/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever
177790: eth0@if177791: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:c5:09:ec:4b:93 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.2.7/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::54c5:9ff:feec:4b93/64 scope link 
       valid_lft forever preferred_lft forever

ip route output:

default via 10.0.2.156 dev eth0 mtu 1450 
10.0.2.156 dev eth0 scope link 
10.45.0.0/16 dev ogstun proto kernel scope link src 10.45.0.1

iptables -S output:

iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

Logs

No response

Expected behaviour

ping -I eth0 x.y.z.w and ping -I ogstun x.y.z.w should ping the IP address succesfully.successfully

Observed Behaviour

eth0 forwards packets correctly, though ogstun does not.

avrodriguezgrad commented 4 months ago

Hi @Thanasislt

The entrypoint of the UPF container has these two instructions: image

Can you check if both are properly configured?

BR, Álvaro

Thanasislt commented 4 months ago

This produces the UPF entry point configmap, right? I think it is properly configured:

    echo "Executing k8s customized entrypoint.sh"
    echo "Creating net device ogstun"
    if grep "ogstun" /proc/net/dev > /dev/null; then
        echo "Warnin: Net device ogstun already exists! may you need to set createDev: false";
        exit 1
    fi

    ip tuntap add name ogstun mode tun
    ip link set ogstun up
    echo "Setting IP 10.45.0.1/16 to device ogstun"
    ip addr add 10.45.0.1/16 dev ogstun;
    sysctl -w net.ipv4.ip_forward=1;
    echo "Enable NAT for 10.45.0.0/16 and device ogstun"
    iptables -t nat -A POSTROUTING -s 10.45.0.0/16 ! -o ogstun -j MASQUERADE;
avrodriguezgrad commented 4 months ago

Mmm, I think so.

Have you tried to capture traffic in ogstun and eth0 interfaces? This can be helpful for debugging what is happening. If you want, you can attach the pcaps, and I'll try to look at them.

BR, Álvaro

Thanasislt commented 4 months ago

Hello, I have attached the UPf pcap files here: upf-eth0_ogstun.zip. Seems like SNAT is not working between ogstun and eth0. I expected traffic to be NAT-ed by ogstun and send via eth0 to the internet. I pinged 192.168.50.1 (Lab gateway) and curl google from eth0 and ogstun.

avrodriguezgrad commented 4 months ago

Well, I believe the problem is not related with the Helm Charts directly. Could be a problem with Cilium and NAT? On the other hand, I would suggest you overriding the UPF entrypoint, test every single line and see if everything is working properly.

BR, Álvaro