free5gc / free5gc

Open source 5G core network base on 3GPP R15
https://free5gc.org
Apache License 2.0
1.88k stars 617 forks source link

[Bugs] N3IWUE fails to run twice or reconnect #584

Closed oliveiraleo closed 1 week ago

oliveiraleo commented 1 month ago

Describe the bug

After connecting the N3IWUE once, if it's killed trying to reconnect fails. AFAIK this happens because a XFRM interface isn't released on N3IWUE device exit and trying to recreate it results in a "file exists" error in free5GC

To Reproduce

Steps to reproduce the behavior:

  1. Install free5GC v3.4.2

  2. Configure free5GC and N3IWF

  3. Install n3iwue v1.0.1

  4. Configure N3IWUE

  5. Start free5GC with -n3iwf parameter

  6. Start the N3IWUE

  7. Wait until N3IWUE finishes the initial setup and ping test

  8. Kill N3IWUE

  9. Start the N3IWUE once again

  10. N3IWF gives errors [ERRO][N3IWF][IKE] Set XFRM rules failed: file exists [ERRO][N3IWF][IKE] Applying XFRM rules failed: Set XFRM state rule failed

  11. N3IWUE crashes with error [FATA][N3UE][IKE] panic: runtime error: invalid memory address or nil pointer dereference

Expected behavior

The N3IWUE could be able to reconnect

Screenshots

N/A

Environment (please complete the following information):

Trace File

Configuration File

free5gc-config.zip

n3ue.yaml.txt

PCAP File

n3ue.pcap.txt

If really required, I can capture it on free5GC later on

Log File

free5gc-logs-n3iwf-error.txt

n3iwue-crash-logs.txt

System architecture (Option)

free5GC from scratch on VM (IP: 10.0.0.110) N3IWUE on a separate VM (IP: 10.0.0.112)

Use case: Simulating a connection loss or temporary interference on N3IWUE

Walkthrough (Option)

N/A

Additional context

This issue happened in free5GC v3.4.1 and N3IWUE v1.0.0 too but I didn't have time to test and report it yet

Perhaps some signaling message could be sent to the N3IWF to release any interfaces connected to the N3IWUE on its exit. Or another measure could be taken such as in UERANSIM that doesn't have the same issue.

If free5GC/N3IWF and N3IWUE are both restarted, then N3IWUE can connect successfully again

oliveiraleo commented 1 month ago

So, I did some tests and found a temporary workaround for the error on N3IWF

Considering the following state of the environment:

Execute this:

  1. Run N3IWUE run.sh
  2. Wait until the message [INFO][N3UE][APP] Keep connection with N3IWF until receive SIGINT or SIGTERM appears
  3. Kill the N3IWUE script
  4. On free5GC's VM issue the command sudo ip xfrm state deleteall
  5. Run N3IWUE run.sh again
  6. Now it will run successfully

I get this message on the second run [WARN][N3UE][IKE] Unimplemented infromational message but didn't have time to investigate it yet

Perhaps the signaling I suggested on my previous message could implement a trigger for the command from step 4 (or some variation of it that don't affect other N3UEs running, because this one drops everything). However, I'm not sure if doing this would violate some 3GPP specification.

a3828162 commented 1 month ago

Hi, @oliveiraleo,

We will trace this issue, thanks !

Best Regard, James

a3828162 commented 1 week ago

Hi, @oliveiraleo

We fixed this issue, please update udr, n3iwf, n3ue to the latest commit

BRs, James

oliveiraleo commented 1 week ago

Hello @a3828162

I can confirm the bug was fixed, thanks a lot for that.

I'm gonna leave below the instructions I've followed to test the fix just in case someone else needs it:

  1. Install free5GC nightly (I tested with commit a39de62)
  2. Update UDR and N3IWF to nightly:
    cd free5gc/NFs/udr/
    git checkout ee6e0c8
    cd ../n3iwf/
    git checkout 9fe155e
    cd ../../
    make n3iwf udr # rebuild those NFs with the new code pulled
  3. Clone N3IWUE nightly on another machine
    git clone https://github.com/free5gc/n3iwue.git
    make

    Note: Commit with the fixes referenced on this issue is c2662c7

  4. Add N3IWUE to free5GC if not already done
  5. Start free5GC + N3IWF and wait for it to load (e.g. Received PFCP Association Setup Accepted Response from UPF[127.0.0.8])
  6. Start N3IWUE and wait for it to load (i.e. [INFO][N3UE][APP] Keep connection with N3IWF until receive SIGINT or SIGTERM)
  7. Kill N3IWUE with SIGTERM Example output on free5GC's side:
    2024-08-29T11:06:17.393956126Z [INFO][AMF][Ngap][ran_addr:X.X.X.10/172.16.0.1:59197] Handle UEContextReleaseComplete
    2024-08-29T11:06:17.394013856Z [INFO][AMF][Ngap][amf_ue_ngap_id:RU:0,AU:1(Non3GPP)][ran_addr:X.X.X.10/172.16.0.1:59197] Handle UEContextReleaseComplete (RAN UE NGAP ID: 0)
    2024-08-29T11:06:17.394085465Z [INFO][AMF][Ngap][ran_addr:X.X.X.10/172.16.0.1:59197] Release UE[imsi-208930000001234] Context : Release Ue Context
    2024-08-29T11:06:17.394112355Z [INFO][AMF][CTX] AmfUe[imsi-208930000001234] is removed

    Example output on N3IWUE's side:

    2024-08-29T11:18:36Z [INFO][N3UE][Init] Terminating N3UE...
    2024-08-29T11:18:36Z [INFO][N3UE][Init] Deleting interfaces created by N3UE
    2024-08-29T11:18:36Z [INFO][N3UE][APP] Delete interface: ipsec-1
    2024-08-29T11:18:36Z [INFO][N3UE][APP] Delete interface: ipsec-2
    2024-08-29T11:18:36.499274588Z [INFO][N3IWF][IKE] Decoding IKE message
    2024-08-29T11:18:36.499316587Z [INFO][N3IWF][IKE] Decoding IKE payloads
    2024-08-29T11:18:36.499322747Z [INFO][N3IWF][IKE] [Encrypted] unmarshal(): Start unmarshalling received bytes
    2024-08-29T11:18:36Z [INFO][N3UE][IKE] Handle Informational
    2024-08-29T11:18:36Z [WARN][N3UE][IKE] Unimplemented infromational message
    2024-08-29T11:18:36Z [INFO][N3UE][APP] Delete interface: gretun-id-2-1
    2024-08-29T11:18:36Z [INFO][N3UE][APP] Delete interface: gretun-id-2-3
    2024-08-29T11:18:36Z [INFO][N3UE][Init] N3UE terminated
    [Info] Remove all GRE interfaces
    del gre0
    [Info] Remove all XFRM interfaces
  8. Run N3IWUE again

Best regards, Leo.