Open gabeblack opened 4 years ago
Hi,
But main usecase for OFP was packet processing and not generator: receive a packet, process it, send it.
If a pure packet generator is your usecase then maybe odp_generator from odp_dpdk will be more useful.
Worker threads can receive packets (access to interface RX queues) and send packets (access to TX queues). Control threads can receive data through socket API and send packets (access to TX queues).
Now, when you are running ofp_send/ofp_sendto in a loop (in a control thread), the traffic is not using the workers: it is just one (your) thread using one TX queue. If your thread happens to run on the same core as a worker thread (that tries to use 100% of the core), you will get poor performance. So, a little bit of thread per core planing is needed.
Usually when planning to use OFP, you have to start with filling a list:
So, tell us more ...
Merci, Bogdan
Hi Bogdan,
Thank you for the detailed response. I was able to improve the performance 5X changing the NIC (mellanox card) to not use the igb_uio driver since it seems DPDK has direct support for the NIC via ibverbs and the mlx poll mode driver. However performance is still well under what we were hoping to achieve.
Anyway, I definitely ensured the send thread was not using the same threads as the worker threads as they were bound to different cores. Initially the control thread was on cpu 0 (I think the default core for control threads), but I put it on cpu 1 since I think a lot of linux processes use cpu 0 for handling interrupts and other things. I think that stuff was negligable because it didn't matter whether I ran on cpu 1 or 0. The worker threads I put on 2 and 3, but it doesn't matter which ones it seems they run on (the system I was testing on has 16 cores). Since I was only doing transmit, I wasn't sure how useful to have more than one thread.
I think I understand the purpose of the hook, but was hoping to avoid that as it seems like it is sort of global hook for the port. Meaning, I could have several udp sockets open sending to different destinations, but all flows would go through the same hook, so that I wouldn't really know which packet belonged to which flow without doing some packet parsing... Seems like there is added complexity there.
Anyway, if there are some benchmarks you might have, especially with the socket api, that would be very useful to know what is possible or what one might expect to be able to achieve.
Hi,
You can try this:
Hooks are points where you can access packet as is processed by ofp: you can inspect the packet or take ownership of the packet and do what you want with the it. Alternatively, you can use "zero-copy" api.. that is basically a hook per socket: see example/udp_fwd_socket/udp_fwd_socket.c line 63
That are many optimizations possible (e.g: use multiple TX queues (one per used core) without Multithread safe) but above points should improve performance. Maybe I'll try myself this scenario on my setup.
Btw, if DUT and packet destination are in the same network you can add a direct route. e.g. in CLI:
route add 192.168.200.20/32 gw 192.168.200.20 dev fp1
Merci, Bogdan
So I made a setup as described above. I changed SHM_PKT_POOL_NB_PKTS to 102400 I've added route and static arp
With OFP_PKT_TX_BURST_SIZE == 1 I am getting: 1 sendloop: 1.763 mpps 2 sendloop: 2.391 mpps 3 sendloop: 2.583 mpps
With OFP_PKT_TX_BURST_SIZE == 16 I am getting: 1 sendloop: 1.827 mpps 2 sendloop: 2.850 mpps 3 sendloop: 4.354 mpps
And this is with regular socket API (ofp_sendto() ).... and without multiple TX queues, etc. I am using a couple of 82599ES connected through DAC on a setup with two I5 (Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz).
Hi,
My first idea would be or you to test udpecho and udp_fwd_socket examples, to check if the numbers are still low. We tested these before and we had better performance than the one you reported.
You can also set OFP_PKT_TX_BURST_SIZE to a higher value, such as 16, in case of line rate traffic and see if the numbers are getting better.
BR, /Iulia
With ofp sitting on top of odp_dpdk, the performance of ofp_send /ofp_sendto is pretty poor (UDP). In a while loop running nothing but ofp_send, the performance caps out at about 110Kpps.
This while loop is running in its own thread, but was not spawned with odp_thread api, but did run the odp/ofp local thread init in order to be able to use the ofp fastpath apis. The ODP/OFP has two dispatch threads running on their own cores.
Is the ofp_send/to family of APIs supposed to not be part of the fast path? i.e. is it on the slow path? Are the pktio interfaces the only ones supposed to be fast? Just curious as to what I might be doing wrong.
Using vanilla dpdk on the same NIC, 4-5Mpps is achievable with little effort or tuning.
Why not use plain DPDK? Was hoping to make use of OFP networking stack capabilities.... Rather not have to populate layer 2-3-4 headers and perform arp resolution, etc etc.