Gradiant / 5g-charts

Helm charts for 5G Technologies
Apache License 2.0
110 stars 43 forks source link

Connect the latest oai-gnb (w36) with open5gs #102

Closed tywofxd closed 1 year ago

tywofxd commented 1 year ago

Hello! Thanks for such a great repository! I followed your tutorial: Open5gs and OAI-GNB in https://gradiant.github.io/openverso-charts/open5gs-oaignb.html, and successfully connected my Redmi K30s with the open5gs core. However, after several minutes, the phone lost its connection with the oai-gnb. I captured the traffic of amf pod and found that it received UEContextReleaseRequest from the gNB pod. I thought there may be some bugs with the oai-gNB with tag 2022.w20, which was tested in this tutorial. So I changed the tag field in the value.yaml of oai-gNB to 2022.w36, which was the latest one I found in your docker hub. However, the deployment of this gNB pod failed, with the following logs: root@5g-master:/home/osboxes# kubectl logs oai-gnb-0 GNB_NGA_IP_ADDRESS=10.244.37.124/32 GNB_NGU_IP_ADDRESS=10.244.37.124/32 check if open5gs-amf-ngap hostname is resolvable open5gs-amf-ngap.default.svc.cluster.local has address 10.103.173.31 AMF_IP_ADDRESS=10.103.173.31 /opt/oai/bin/nr-softmodem.Rel15 -O /oai.conf --sa -E --continuous-tx

And the output of kubectl describe command was: root@5g-master:/home/osboxes# kubectl describe pod oai-gnb-0 Name: oai-gnb-0 Namespace: default Priority: 0 Node: 5g-node/192.168.5.3 Start Time: Fri, 28 Oct 2022 05:30:03 -0400 Labels: app.kubernetes.io/instance=oai-gnb app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=oai-gnb controller-revision-hash=oai-gnb-65686dd886 helm.sh/chart=oai-gnb-0.3.1 statefulset.kubernetes.io/pod-name=oai-gnb-0 Annotations: cni.projectcalico.org/containerID: 736122d866353be53832d52edda0bf4d23bb8441849286063a57a391211ac07b cni.projectcalico.org/podIP: 10.244.37.124/32 cni.projectcalico.org/podIPs: 10.244.37.124/32 Status: Running IP: 10.244.37.124 IPs: IP: 10.244.37.124 Controlled By: StatefulSet/oai-gnb Containers: oai-gnb: Container ID: docker://26f27f7db08d3a0e13748b8eadffaea659c9e93d8b87c42719d21b5a370840bf Image: docker.io/openverso/oai:2022.w36 Image ID: docker-pullable://openverso/oai@sha256:3e5cd69f80b2f1aa7882ee19a3ba4909c56b3832e8a0ebd65833c35e5fd8e5be Port: 2152/UDP Host Port: 0/UDP Args: /opt/oai/bin/nr-softmodem.Rel15 -O /oai.conf --sa -E --continuous-tx State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 132 Started: Fri, 28 Oct 2022 05:52:58 -0400 Finished: Fri, 28 Oct 2022 05:52:58 -0400 Ready: False Restart Count: 9 Limits: ettus.com/usrp: 1 Requests: ettus.com/usrp: 1 Environment: CONFIG_TEMPLATE_PATH: /opt/oai/etc/gnb.sa.tdd.conf GNB_NGA_IF_NAME: eth0 GNB_NGU_IF_NAME: eth0 AMF_HOSTNAME: open5gs-amf-ngap Mounts: /opt/oai/etc from config (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d52j7 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: oai-gnb Optional: false kube-api-access-d52j7: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: ettus.com/usrp:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 27m default-scheduler Successfully assigned default/oai-gnb-0 to 5g-node Normal Pulling 27m kubelet Pulling image "docker.io/openverso/oai:2022.w36" Normal Pulled 25m kubelet Successfully pulled image "docker.io/openverso/oai:2022.w36" in 1m49.857430058s Normal Created 24m (x5 over 25m) kubelet Created container oai-gnb Normal Started 24m (x5 over 25m) kubelet Started container oai-gnb Normal Pulled 24m (x4 over 25m) kubelet Container image "docker.io/openverso/oai:2022.w36" already present on machine Warning BackOff 2m20s (x110 over 25m) kubelet Back-off restarting failed container

Could you please give me some help on deploying the latest version of oai-gnb pod? or give me some clues on why the gnb pod sends UEContextReleaseRequest to amf? I also checked the logs of gNB, it said: [NR_RRC] Removing UE 72af instance after UE_CONTEXT_RELEASE_Complete (ue_release_timer_rrc timeout) [NR_MAC] [gNB 0] Remove NR UE_id 0: rnti 0x72af [NR_MAC] to remove in mac rnti_to_remove[0] = 0x72af [GTPU] [0] Deleted all tunnels for RNTI 72af (1 tunnels deleted) [RRC] [FRAME 00857][eNB][MOD 00][RNTI 72af] Removed UE context [NR_RRC] remove UE 72af [NR_PHY] to remove rnti 0x72af [NR_PHY] to remove rnti_to_remove_count=1, up_removed=1 down_removed=0 pucch_removed=0

Then my UE was lost. I appreciate it so much if you can help me! Thanks in advance!

cgiraldo commented 1 year ago

It seems gnb exit with code 132 (Illegal Instructions). Sometimes that happens to us when building the docker image in certain servers (we still do not know why).

Try building the image locally in your server from here: https://github.com/Gradiant/openverso-images/tree/main/images/oai

docker build --build-arg version=2022.w36 -t custom-oai-gnb:2022.w36 .

One option to use your custom image in kubernetes is to upload it to a dockerhub account of your own and modify the image ref in the helm chart accordingly.

Please, keep us updated if this solve your issue to modify the openverso/oai:2022.w36 image accordingly.

tywofxd commented 1 year ago

Hello! Thank you for your kind reply! I followed your suggestion and tried to build the image locally. But after I ran the given command:

docker build --build-arg version=2022.w36 -t custom-oai-gnb:2022.w36 .

I got error shown as follows: Unpacking libuhd4.2.0:amd64 (4.2.0.1-0ubuntu1~focal1) ... dpkg: error processing archive /tmp/apt-dpkg-install-Tmox5V/72-libuhd4.2.0_4.2.0.1-0ubuntu1~focal1_amd64.deb (--unpack): trying to overwrite '/usr/share/uhd/cal/cal_metadata.fbs', which is also in package libuhd4.3.0:amd64 4.3.0.0-0ubuntu1~focal1 dmesg: read kernel buffer failed: Operation not permitted Selecting previously unselected package ttf-bitstream-vera.

Errors were encountered while processing: /tmp/apt-dpkg-install-Tmox5V/72-libuhd4.2.0_4.2.0.1-0ubuntu1~focal1_amd64.deb E: Sub-process /usr/bin/dpkg returned an error code (1) build have failed The command '/bin/sh -c /bin/sh oaienv && cd cmake_targets && mkdir -p log && ./build_oai -I -w USRP' returned a non-zero code: 100

(I omitted some info logs and only showed the errors)

Could you please give me some suggestions on how to solve this issue? I ran it as root, but there was an Operation not permitted error. Thank you very much!

avrodriguezgrad commented 1 year ago

Yes, that error is normal but it is not our fault. OAI has a file which downloads the necessary requirements and this file downloads libuhd4.2.0, but this version of libuhd enters in conflict with the next version (4.3.0). We are waiting to see if OAI fixes this in its file or Ettus resolves the conflict in its ppa repository. If not, we will apply a patch to build the images, but we don't know if OAI supports the latest Ettus version.

Nevertheless, OAI in K8s is so unstable so the behaviour you told is normally the behaviour we have.

tywofxd commented 1 year ago

Yes, I met the same error when I tried to build OAI gnb locally on my ubuntu 18.04 machine. After running "./build_oai -I -w USRP", I got that error. But I followed this https://genuinecoder.com/how-to-fix-trying-to-overwrite-which-is-also-in-package-issue-in-linux/, and fixed this error at the second run of "./build_oai -I -w USRP". The building was successful at my local machine. You mean that by applying a patch, it would be possible for me to build the openverso/oai:2022.w36 image? Could you please explain more about this so that I can have a try? Thanks!

avrodriguezgrad commented 1 year ago

I don't know if that is the correct solution to the problem, because libuhd-dev and uhd-host are installing libuhd4.3.0, so it can cause problem with libuhd4.2.0. The patch i was referring to is the following: in the Dockerfile after the instruction "RUN git checkout $VERSION" you can add the following "RUN sed -i 's/libuhd4.2.0/libuhd4.3.0/g' cmake_targets/tools/build_helper", and build the image.

With that, you are building the image with the latest version of Ettus, 4.3.0, but I can't assure OAI will have compatibility with it, being the main reason we don't update the Dockerfile in our repo.

Anyway, if you test the new version with this patch, can you tell us the behaviour of the image and the stability of the mobile connectivity?

tywofxd commented 1 year ago

Hello! I modified the Dockerfile the same way as your suggestion, but another error occurred.

protobuf/protobuf-c installation successful installing dependencies successful dpkg: error: cannot access archive '/tmp/apt-dpkg-install-Tmox5V/72-libuhd4.2.0_4.2.0.1-0ubuntu1~focal1_amd64.deb': No such file or directory The command '/bin/sh -c /bin/sh oaienv && cd cmake_targets && mkdir -p log && ./build_oai -I -w USRP && dpkg -i --force-overwrite /tmp/apt-dpkg-install-Tmox5V/72-libuhd4.2.0_4.2.0.1-0ubuntu1~focal1_amd64.deb && apt-get --fix-broken install && ./build_oai -I -w USRP' returned a non-zero code: 2

Would you please further give me some suggestions about how to solve this? I attacked my Dockerfile. Thanks!

Dockerfile.zip

avrodriguezgrad commented 1 year ago

I don't think the dpkg commands are necessary for the build changing the version to libuhd4.3.0. Try with the Dockerfile attached, I can build the image with it. Dockerfile.zip

cgiraldo commented 1 year ago

Hi,

Please, try with any of the following images:

openverso/oai:2022.w36-uhd4.3

openverso/oai:2022.w43-uhd4.3

Please, report how these images perform since we didn't test them before.

tywofxd commented 1 year ago

@avrodriguezgrad Thank you for your kindly attached Dockerfile. I tried your Dockerfile, but, unfortunately, the building process failed again. The following errors occurred:

Adding group `usrp' (GID 106) ... Done. sysctl: cannot stat /proc/sys/net/core/rmem_max: No such file or directory sysctl: cannot stat /proc/sys/net/core/wmem_max: No such file or directory Warning: Could not update sysctl settings for network devices. Setting up python-is-python2 (2.7.17-4) ... Processing triggers for libc-bin (2.31-0ubuntu9.9) ... Processing triggers for mime-support (3.64ubuntu1) ... Removing intermediate container f3efcabf7eff ---> aefeaac87f03 Step 27/39 : COPY --from=build /oai-ran/targets/bin/*.Rel15 /opt/oai/bin/ COPY failed: stat oai-ran/targets/bin/lte-softmodem.Rel15: file does not exist

@cgiraldo Thank you for your updated images. I tried to use the new tag (both 2022.w36-uhd4.3 and 2022.w43-uhd4.3) when deploying the oai-gnb-0 pod, but an error occurred the same as when I was testing 2022.w36.

State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 132 Started: Wed, 02 Nov 2022 08:06:13 -0400 Finished: Wed, 02 Nov 2022 08:06:13 -0400 Ready: False

It seems that I must successfully build the oai image locally if I want to use the newest version of oai-gnb as a k8s pod. I'm really sorry to bother you but there are always problems with my image building process. Hope you can further give me some suggestions, thanks!

cgiraldo commented 1 year ago

Hi @tywofxd.

Regarding your error 132, maybe it is similar to this thread of mongodb (https://github.com/docker-library/mongo/issues/485) where the problem is that the cpu must meet some microarchitecture requirement (i.e., CPUs with AVX instructions).

Can you provide the output of cat /proc/cpuinfo in your server?


On the other hand, to be able to build the docker image, use the attached dockerfile.

Dockerfile.zip

It should work, at least for version 2022.w43.

tywofxd commented 1 year ago

Hello! @cgiraldo You are right. The attached Dockerfile worked well for version 2022.w43. I successfully built the oai-gnb 2022.w43 image locally. Then I uploaded it to my dockerhub account and changed the image repository value of the helm chart to my own. The oai-gnb-0 pod was running, but when I checked its logs, I was over overwhelmed by the error outputs. The following error are all over the screen.

[PHY] rx_rf: Asked for 23040 samples, got 0 from USRP [PHY] problem receiving samples [HW] [recv] received 11484 samples out of 23040 [HW] Time: 110.172 s ERROR_CODE_OVERFLOW (Overflow)

Does it mean that OAI does not have compatibility with the latest version of Ettus, 4.3.0? So I really do not have a change to use the oai-gnb in k8s? Can I use the open5gs as k8s pods and use the oai nr-softmodem locally built on my machine? I'm also curious about why the same Dockerfile only works for version 2022.w43, but fails with 2022.w36?

Besides, I attached the cpuinfo of my server. It seems that it supports the AVX instructions, so that maybe the error 132 in my case is not caused by this reason.

cpyinfo.zip

avrodriguezgrad commented 1 year ago

Hi, Did you follow OAI instructions about CPU? Link: https://gitlab.eurecom.fr/oai/openairinterface5g/-/wikis/OpenAirKernelMainSetup I think that error can be caused because of not following the above instructions instead of UHD version.

Yes, you can use open5gs in k8s pods, but AMF and UPF need to have reachable IPs by gNB.

tywofxd commented 1 year ago

Hello, Thank you so much!! In fact, before trying this k8s environment, I have already exactly followed the OAI instructions about CPU on your given link. So I think the error is not about the CPU setting but may be related to the USRP. I just pulled out the USRP and re-plugged it through USB, and re-deployed the oai-gnb pod with tag 2022.w43, and it started to work!

I just don't know how to explain it. It's strange.

Although the oai-gnb pod (2022.w43) is working, my phone (Mi 10) cannot connect to it. The RRC connection procedure just failed. But my phone worked well with oai-gnb with tag 2022.w20.

Then I tried to use oai nr-softmodem locally built on my machine and configured the AMF IP the same as the AMF pod. Then it worked! The phone could connect to this 5G and reach the internet. My nr-softmodem was built on the oai develop branch.

Although the oai-gnb pod with 2022.w43 still has connection problem with my phone, at least I can play with this environment using the locally built nr-softmodem.

I really appreciate all your replies!!! Thanks!

cgiraldo commented 1 year ago

Good to here It worked.

We managed to connect a Samsung Galaxy Tab S7+ 5G and do a speedtest (+100Mbps). However the oai-gnb crashes after a minute of data transmission.

I will close the issue, but please, feel free to share your progress, specially your success.

Good luck!