Using dpdk soft pmd with OFP

Neetika02 commented 4 years ago

Hello,

We are looking for options to accelerate our 4G User plane process by either using OFP or using DPDK directly. We have gone through the OFP guides and also the example application udp_fwd_stack.It does seem that porting our existing process to OFP will be simpler than porting it over DPDK. I have a few queries if someone can help with these: Q1>> Since the packets received on the S1U or S5/S8 interface of the 4G LTE User Plane are GTP-U tunneled, there is an Inner IP address present in the packet after the GTP Header. The packet is like IP->UDP->GTP->IP Now as per the ODP hash function that OFP uses only the outer IP/UDP header parameters can be hashed. I understand that there's no way we can do the inner IP matching in hardware directly? Q2>> Can I use the soft pmd feature(for hashing inner IP) of the dpdk or any other features DPDK directly provides along with OFP? Q3>> What will be faster for processing the recieved packets Creating an ofp_socket and then process the packet and doing ofp_udp_pkt_sendto() or using OFP_HOOK to get the IP packets and start processing the same and finally send using ofp_ip_output_send(). Q4>> In the example applications I see the worker threads doing their own malloc whereas the OFP is using the shared memory concept of ODP. Should the memory allocation functions of my process be also mapped to the shared memory provided by ODP?

bogdanPricope commented 4 years ago

Hello and welcome to OFP.

I'll try to answer to some of your questions.. starting with easier ones.

A4>> OFP is using shared memory to share internal data structures between threads or processes. Else, should not matter: arguments to socket API are either used immediately or copied.

A3>> Long story short: OFP_HOOK is faster (and send packet with ofp_ip_send() if I remember correctly) Long story:

OFP_HOOK is faster: it has direct access to received packet. It is called from inside the stack on stack cores.
OFP_SIGEV_HOOK is slower: it has direct access to received packet but it has to locate the socket (= some locks + socket search). It is called from inside the stack on stack cores.
ofp_recv() + ofp_send()/ofp_sendto() is slowest: are called from user cores. Need extra locks and memcpys

bogdanPricope commented 4 years ago

A1/2>> I am not getting what you mean with "IP matching in hardware": You may want:

Extract inner IP packet and process it with OFP
Use some classification API to direct some packets matching a criteria to a special queue and maybe discard the rest.
Expect some HW offload for some operations (like csum) on inner packet
or?

One thing - in order to work: NIC has to support that operation (HW offload); DPDK has to support it as well; ODP (the layer under OFP) has to provide an API for it.

I never used the soft pmd but my guess is that is providing software operations not HW offloads.

Neetika02 commented 4 years ago

Hello Bogdan,

Thanks a lot for the detailed reply. It has helped me understand better. I now have another query:

If I want to create 3 different types of worker threads. A - threads which processes only IP traffic identified by a set of destination IP addresses B - threads which process UDP packets intended for a particular port lets say 7777 C - threads which process UDP packets intended for some other port/ports. How do I create this architecture. Is there any way I can utilise the odp_pktio_hash() to send these packets to different threads or do I need to write a master distributor which will do the job. In case I need to write the master distributor how will I be able to make use of the multiple rx queues? Thanks in advance for bearing with me!

bogdanPricope commented 4 years ago

Nice questions. I'll think over weekend at this. One idea is to use the classification API to classify different kind of traffic on different queues (https://github.com/OpenFastPath/ofp/tree/master/example/classifier) and then schedule those queues on required cores, etc. but... maybe there is newer API in ODP for that (they were working at some point on hashing and spreading API). I have to check.

Other idea is a multi-layer processing: a set of cores doing some processing and forwarding part of the traffic to other cores (maybe after decapsulation or other processing) (playing with the number of cores for each type of processing)... I think I did a GTPU/PDU architecture like this for a POC.

OpenFastPath / ofp

Using dpdk soft pmd with OFP #254