Xilinx-CNS / onload

OpenOnload high performance user-level network stack
Other
580 stars 95 forks source link

Onload AF_XDP degraded latency on AWS #139

Open sarosharif opened 1 year ago

sarosharif commented 1 year ago

I tested out onload AF_XDP latency on AWS machine(m5zn.12xlarge). Initially I had some issues related to setting it up. After building and installing I registered device using this command: echo ens5 |sudo tee /sys/module/sfc_resource/afxdp/register But whenever I ran any application using onload, I encountered this error:

oo:nc.openbsd[1172]: netif_tcp_helper_alloc_u: ENODEV. This error can occur if:

  • no Solarflare network interfaces are active/UP, or they are running packed stream firmware or are disabled, and
  • there are no AF_XDP interfaces registered with sfc_resource Please check your configuration.

Later on I was able to debug and fix this by running these two commands: sudo ifconfig ens5 mtu 3000 sudo ethtool -L ens5 combined 1

Afterwards I measured the latency with and without onload afxdp using a simple ping pong application that also takes timestamp when the packet is returned. The measurements are taken on two machines in same placement group, over 100000 samples, twice and the application runs on UDP protocol. Here are the results:

kernel      
latency_mean: 43.626 us   latency_mean: 43.512 us  
latency_min: 38.579 us   latency_min: 38.687 us  
latency_max: 136.695 us   latency_max: 139.111 us  
latency_median: 42.735 us   latency_median: 42.865 us  
latency_std_dev: 5.122 us   latency_std_dev: 4.433 us  
onload af_xdp      
latency_mean: 50.358 us   latency_mean: 50.581 us  
latency_min: 42.985 us   latency_min: 44.284 us  
latency_max: 240.949 us   latency_max: 192.622 us  
latency_median: 47.883 us   latency_median: 47.931 us  
latency_std_dev: 7.539 us   latency_std_dev: 7.025 us  

Degraded latency is observed with onload af_xdp.

Additionally I also measured throughput using linux iperf tool on UDP. Throughput improved only marginally:

setup throughput Gbits/sec
kernel 6.96
onload 7.24

These results are contrary to my expectations and I have a few questions:

  1. Does virtualization negatively effect the performance of onload af_xdp? here better results in terms of throughput are recorded
  2. Does onload afxdp work better on some cards as compared to others?
  3. Is there any further setting required to get better results?
ol-alexandra commented 1 year ago

In a sense, iperf tool is not compatible with Onload. iperf uses 2 separate threads for read and write. For Onload it means lock contention. The "better results" you mention are for memcached.

Latency is not expected to benefit from AF_XDP Onload at all. I.e. your results match my expectations.

sarosharif commented 1 year ago

Thank you for your response. Can you recommend some tool for throughput measurement instead of iperf, through which I would be able to see best results for onload af_xdp?

ol-alexandra commented 1 year ago

I can recommend something like memcached. Or may be nginx (caution: Onload nginx profile needs "clustering"; I saw some recent commits here in the master branch of Onload which add "clustering" support to AF_XDP but I did not try it).

AF_XDP Onload provides some benefit for an application which uses epoll to manage a lot of sockets.

lparkersc commented 4 months ago

Does anyone have any advice for tuning onload on ena for custom applications?