edgecomllc / eupf

5G User Plane Function (UPF) based on eBPF
Apache License 2.0
107 stars 20 forks source link

eupf stops working after running a while (data not moving anymore) #480

Closed Sabreu closed 1 year ago

Sabreu commented 1 year ago

Hi,

I have this problem where data (iperf3) stops working after eupf has been running a while. In this log data is not working anymore. After this all data is halt until eupf is rebooted. Is it because of "Can't apply GTP PDR: update: key too big for map: argument list too long"?

marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Got Session Establishment Request from: 10.10.1.3.
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF
marras 11 00:40:33 14900k eupf[93391]: Session Establishment Request:
marras 11 00:40:33 14900k eupf[93391]:   CreatePDR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     FAR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     QER ID: 1
marras 11 00:40:33 14900k eupf[93391]:     URR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Source Interface: 1
marras 11 00:40:33 14900k eupf[93391]:     UE IPv4 Address: 10.45.0.237
marras 11 00:40:33 14900k eupf[93391]:   CreatePDR ID: 2
marras 11 00:40:33 14900k eupf[93391]:     Outer Header Removal: 0
marras 11 00:40:33 14900k eupf[93391]:     FAR ID: 2
marras 11 00:40:33 14900k eupf[93391]:     QER ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Source Interface: 0
marras 11 00:40:33 14900k eupf[93391]:     TEID: 164029
marras 11 00:40:33 14900k eupf[93391]:     Ipv4: 10.10.4.1
marras 11 00:40:33 14900k eupf[93391]:     Ipv6: <nil>
marras 11 00:40:33 14900k eupf[93391]:     UE IPv4 Address: 10.45.0.237
marras 11 00:40:33 14900k eupf[93391]:   CreatePDR ID: 3
marras 11 00:40:33 14900k eupf[93391]:     Outer Header Removal: 0
marras 11 00:40:33 14900k eupf[93391]:     FAR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     QER ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Source Interface: 3
marras 11 00:40:33 14900k eupf[93391]:     TEID: 260015
marras 11 00:40:33 14900k eupf[93391]:     Ipv4: 10.10.4.1
marras 11 00:40:33 14900k eupf[93391]:     Ipv6: <nil>
marras 11 00:40:33 14900k eupf[93391]:   CreatePDR ID: 4
marras 11 00:40:33 14900k eupf[93391]:     Outer Header Removal: 0
marras 11 00:40:33 14900k eupf[93391]:     FAR ID: 3
marras 11 00:40:33 14900k eupf[93391]:     Source Interface: 0
marras 11 00:40:33 14900k eupf[93391]:     TEID: 164029
marras 11 00:40:33 14900k eupf[93391]:     Ipv4: 10.10.4.1
marras 11 00:40:33 14900k eupf[93391]:     Ipv6: <nil>
marras 11 00:40:33 14900k eupf[93391]:     SDF Filter: permit out 58 from ff02::2/128 to assigned
marras 11 00:40:33 14900k eupf[93391]:   CreateFAR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Apply Action: [12 0]
marras 11 00:40:33 14900k eupf[93391]:     BAR ID: 1
marras 11 00:40:33 14900k eupf[93391]:   CreateFAR ID: 2
marras 11 00:40:33 14900k eupf[93391]:     Apply Action: [2 0]
marras 11 00:40:33 14900k eupf[93391]:     Forwarding Parameters:
marras 11 00:40:33 14900k eupf[93391]: [33B blob data]
marras 11 00:40:33 14900k eupf[93391]:   CreateFAR ID: 3
marras 11 00:40:33 14900k eupf[93391]:     Apply Action: [2 0]
marras 11 00:40:33 14900k eupf[93391]:     Forwarding Parameters:
marras 11 00:40:33 14900k eupf[93391]: [33B blob data]
marras 11 00:40:33 14900k eupf[93391]:       Outer Header Creation: &{OuterHeaderCreationDescription:256 TEID:1235 IPv4Address:10.10.1.3 IPv6Address:<nil> PortNumber:0 CTag:0 STag:0}
marras 11 00:40:33 14900k eupf[93391]:   CreateQER ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Gate Status DL: 0
marras 11 00:40:33 14900k eupf[93391]:     Gate Status UL: 0
marras 11 00:40:33 14900k eupf[93391]:     Max Bitrate DL: 10000000
marras 11 00:40:33 14900k eupf[93391]:     Max Bitrate UL: 10000000
marras 11 00:40:33 14900k eupf[93391]:     QFI: 1
marras 11 00:40:33 14900k eupf[93391]:   CreateURR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Measurement Method: 2
marras 11 00:40:33 14900k eupf[93391]:     Volume Threshold: &{Flags:1 TotalVolume:104857600 UplinkVolume:0 DownlinkVolume:0}
marras 11 00:40:33 14900k eupf[93391]:   CreateBAR ID: 1
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Saving FAR info to session: 1, {Action:12 OuterHeaderCreation:0 Teid:0 RemoteIP:0 LocalIP:17041930 TransportLevelMarking:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put FAR: internalId=1, qerInfo={Action:12 OuterHeaderCreation:0 Teid:0 RemoteIP:0 LocalIP:17041930 TransportLevelMarking:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF WARN: No OuterHeaderCreation
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Saving FAR info to session: 2, {Action:2 OuterHeaderCreation:0 Teid:0 RemoteIP:0 LocalIP:17041930 TransportLevelMarking:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put FAR: internalId=2, qerInfo={Action:2 OuterHeaderCreation:0 Teid:0 RemoteIP:0 LocalIP:17041930 TransportLevelMarking:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Saving FAR info to session: 3, {Action:2 OuterHeaderCreation:1 Teid:1235 RemoteIP:50399754 LocalIP:17041930 TransportLevelMarking:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put FAR: internalId=3, qerInfo={Action:2 OuterHeaderCreation:1 Teid:1235 RemoteIP:50399754 LocalIP:17041930 TransportLevelMarking:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Saving QER info to session: 1, {GateStatusUL:0 GateStatusDL:0 Qfi:1 MaxBitrateUL:1410065408 MaxBitrateDL:1410065408 StartUL:0 StartDL:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put QER: internalId=1, qerInfo={GateStatusUL:0 GateStatusDL:0 Qfi:1 MaxBitrateUL:1410065408 MaxBitrateDL:1410065408 StartUL:0 StartDL:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put PDR Downlink: ipv4=10.45.0.237, pdrInfo={OuterHeaderRemoval:0 FarId:1 QerId:1 SdfFilter:<nil>}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put PDR Uplink: teid=164029, pdrInfo={OuterHeaderRemoval:0 FarId:2 QerId:1 SdfFilter:<nil>}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Can't apply GTP PDR: update: key too big for map: argument list too long
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Put PDR Uplink: teid=260015, pdrInfo={OuterHeaderRemoval:0 FarId:1 QerId:1 SdfFilter:<nil>}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Can't apply GTP PDR: update: key too big for map: argument list too long
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 DBG Error extracting PDR info: SDF Filter: bad formatting. Should be compatible with regexp: ^permit out (icmp|ip|tcp|udp) from (any|[\d.]+|[\da-fA-F:]+)(?:/(\d+))?(?: (\d+|\d+-\d+))? to ([\d.]+|[\da-fA-F:]+)(?:/(\d+))?(?: (\d+|\d+-\d+))?$
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Session Establishment Request from 10.10.1.3 accepted.
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Received 70 bytes from 10.10.1.3:8805
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Handling PFCP message from 10.10.1.3:8805
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Got Session Modification Request from: 10.10.1.3.
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Finding association for 10.10.1.3
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Finding session 514
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF
marras 11 00:40:33 14900k eupf[93391]: Session Modification Request:
marras 11 00:40:33 14900k eupf[93391]:   UpdateFAR ID: 1
marras 11 00:40:33 14900k eupf[93391]:     Apply Action: [2 0]
marras 11 00:40:33 14900k eupf[93391]:     Update forwarding Parameters:
marras 11 00:40:33 14900k eupf[93391]: [33B blob data]
marras 11 00:40:33 14900k eupf[93391]:       Outer Header Creation: &{OuterHeaderCreationDescription:256 TEID:3296788770 IPv4Address:10.10.10.41 IPv6Address:<nil> PortNumber:0 CTag:0 STag:0}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF Updating FAR info: 1, {FarInfo:{Action:2 OuterHeaderCreation:1 Teid:3296788770 RemoteIP:688523786 LocalIP:17041930 TransportLevelMarking:0} GlobalId:1}
marras 11 00:40:33 14900k eupf[93391]: 2023/11/11 00:40:33 INF EBPF: Update FAR: internalId=1, farInfo={Action:2 OuterHeaderCreation:1 Teid:3296788770 RemoteIP:688523786 LocalIP:17041930 TransportLevelMarking:0}

Other core components: open5gs eupf version: 0.5.2

pirog-spb commented 1 year ago

Hi @Sabreu,

Could you estimate total count of active PDU sessions in the moment of failure?

Or just use eUPF statistic metrics (curl http://<eupf_ip:9090/metrics>). See upf_pfcp_sessions

Sabreu commented 1 year ago

@pirog-spb

I only have one UE in use atm. Should UE detach (Session Deletion Request) delete the sessions? I already rebooted the upf but right now it seems to be upf_pfcp_sessions 5. I can report later when the issue occurs again what pfcp session amount is.

Thanks

Sabreu commented 1 year ago

Hi @pirog-spb

Upf stopped working again. upf_pfcp_sessions is now 18 but it seems that the problem started when session amount was 1 or 2 (17 sessions were created after the issue started to happen)

it also seems the previous session delete was fine:

2023/11/14 11:11:16 INF Got Session Deletion Request from: 10.10.1.3.
2023/11/14 11:11:16 INF EBPF: Delete PDR Downlink: ipv4=10.45.0.7
2023/11/14 11:11:16 INF EBPF: Delete PDR Uplink: teid=235730
2023/11/14 11:11:16 INF EBPF: Delete PDR Uplink: teid=41348
2023/11/14 11:11:16 INF EBPF: Delete FAR: intenalId=1
2023/11/14 11:11:16 INF EBPF: Delete FAR: intenalId=2
2023/11/14 11:11:16 INF EBPF: Delete FAR: intenalId=3
2023/11/14 11:11:16 INF EBPF: Delete QER: internalId=1
2023/11/14 11:11:16 INF Deleting session: 513

BTW yesterday and today session 514 is the first one to fail, coincidence?

thanks

Sabreu commented 1 year ago

Hi @pirog-spb

I made a test with this kind of code change (use Delete instead of Update):

func (bpfObjects *BpfObjects) DeletePdrUpLink(teid uint32) error {
    log.Info().Msgf("EBPF: Delete PDR Uplink: teid=%d", teid)
    //return bpfObjects.PdrMapUplinkIp4.Update(teid, unsafe.Pointer(&PdrInfo{}), ebpf.UpdateExist) 
    return bpfObjects.PdrMapUplinkIp4.Delete(teid)
}

Did the same thing for DeletePdrDownlink.

Got past this 514th session, atm 581th session ongoing. So it seems like the keys are not deleted correctly with the Update? I know close to zero about the inner workings so please check what should be correct way to handle this. I don't know if this kind of change could cause other issues or not.

Thanks

pirog-spb commented 1 year ago

Well, this is an obvious bug. Will fix it ASAP

@Sabreu thanx for investigation!

pirog-spb commented 1 year ago

@Sabreu could you please try fix from associated pull-request?

Sabreu commented 1 year ago

@pirog-spb 650th session ongoing atm with the pull-request. Btw why the logging was changed? I don't see any of those logs anymore even with debug prints on. Not sure if those are needed even, maybe if debug is on?

pirog-spb commented 1 year ago

@Sabreu Yes, logging was changed little bit. info -> trace in tha case. Please use '--loglvl' option via command line or log_level option in config file.

Sabreu commented 1 year ago

@pirog-spb ah yes it's even lower log_level than debug now I got it.

In addition could these be debug maybe (if using INFO level it would reduce a lot of logging)? Just my opinion...

eupf[1013465]: 2023/11/16 18:38:22 INF Received 16 bytes from 10.10.1.3:8805
eupf[1013465]: 2023/11/16 18:38:22 INF Handling PFCP message from 10.10.1.3:8805

Thanks

pirog-spb commented 1 year ago

@Sabreu Good point.

Logging levels were implemented quite recently. So there are a lot of debug outputs that have to be refactored :)

Am I understand correctly that eUPF now works without hanging up?

Thank you for the feedback!

Sabreu commented 1 year ago

@pirog-spb Yes working fine without hangs.