Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
570 stars 152 forks source link

libvma performance issues with haproxy #877

Open scarlet-storm opened 4 years ago

scarlet-storm commented 4 years ago

I have observed the bandwidth and latency improvements in benchmark applications like sockperf and iperf with use of libvma for small TCP message sizes. Hence, I am trying to evaluate the performance improvement of haproxy with libvma for analysing use of libvma in layer 7 load balancing. I have a setup of two machines running nginx servers and haproxy is configured in http mode with round robin load balancing. I am using wrk as a load generator from another machine on the network to benchmark the haproxy setup. Without libvma I have results for the given test from wrk as

wrk --latency http://10.48.114.100:8025 -t 10 -c 12 -d 30
Running 30s test @ http://10.48.114.100:8025
  10 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   339.48us  284.01us  12.93ms   97.42%
    Req/Sec     3.09k   217.10     4.18k    71.79%
  Latency Distribution
     50%  303.00us
     75%  357.00us
     90%  436.00us
     99%  789.00us
  925118 requests in 30.10s, 740.22MB read
Requests/sec:  30735.44
Transfer/sec:     24.59MB

Running with libvma

LD_PRELOAD=libvma.so haproxy -- /etc/haproxy/haproxy.cfg
wrk --latency http://10.48.114.100:8025 -t 10 -c 12 -d 30
Running 30s test @ http://10.48.114.100:8025
  10 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    20.81ms  116.18ms   1.03s    96.55%
    Req/Sec     2.71k   742.46     7.98k    85.83%
  Latency Distribution
     50%  215.00us
     75%  593.00us
     90%    1.02ms
     99%  767.08ms
  771434 requests in 30.10s, 617.25MB read
Requests/sec:  25629.47
Transfer/sec:     20.51MB

Also running with VMA_SPEC=latency

VMA_SPEC=latency LD_PRELOAD=libvma.so haproxy -- /etc/haproxy/haproxy.cfg
wrk --latency http://10.48.114.100:8025 -t 10 -c 12 -d 30
Running 30s test @ http://10.48.114.100:8025
  10 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.26ms    3.64ms  75.16ms   84.54%
    Req/Sec     1.07k   380.69     3.46k    77.74%
  Latency Distribution
     50%  329.00us
     75%    3.30ms
     90%    7.67ms
     99%   12.13ms
  291558 requests in 30.03s, 233.29MB read
  Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec:   9708.40
Transfer/sec:      7.77MB

Both average latency and total bandwidth stats are lesser with libvma. I have tried following the tuning guide to bind the process to same NUMA node as the NIC and to cores but the results are still worse. This behaviour is strange as vma_stats shows all the packets as offloaded.

Are there any tips to tuning libvma paramters to increase performance for this particular workload?

haproxy config file for reference.

global
    user root 
    group root
    daemon

    # Default SSL material locations
    ca-base /etc/ssl/certs
    crt-base /etc/ssl/private

    # Default ciphers to use on SSL-enabled listening sockets.
    # For more information, see ciphers(1SSL). This list is from:
    #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
    # An alternative list with additional directives can be obtained from
    #  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
    ssl-default-bind-options no-sslv3
    nosplice
    # TUNING
    #tune.h2.initial-window-size 1048576

defaults
        timeout connect 50000
        timeout client  500000
        timeout server  500000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

    http-reuse safe

# My Configuration
frontend fe
    mode http
        bind *:8025
        default_backend be

backend be
        mode http
    balance roundrobin
    #option http-keep-alive
        server s0 10.48.34.122:80 
    server s2 10.48.34.125:80

Config: VMA_VERSION: 8.9.5-0 OFED Version: MLNX_OFED_LINUX-4.7-3.2.9.0 System: 4.9.0-9-amd64 Architecture: x86_64 NIC: ConnectX-5 EN network interface card

igor-ivanov commented 4 years ago

Hello @LeaflessMelospiza, Any networking benchmarks can not always allow to reconstruct real world application behavior. haproxy has own specific and it has not been studied well to have recommended optimal VMA configuration. You can try to compile VMA using --enable-tso configuration option.