Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
557 stars 152 forks source link

issue: errno=111 Connection refused #1049

Open weizhoublue opened 9 months ago

weizhoublue commented 9 months ago

I run sockperf test referring to the doc https://docs.nvidia.com/networking/display/vmav952/running+vma

I got two hosts with mellanox cx5 with dual-port

~# ethtool -i ens6f0np0
driver: mlx5_core
version: 23.07-0.5.1
firmware-version: 16.27.6008 (LNV0000000033)
expansion-rom-version:
bus-info: 0000:af:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

# show_gids
DEV PORT    INDEX   GID                 IPv4        VER DEV
--- ----    -----   ---                 ------------    --- ---
mlx5_0  1   0   fe80:0000:0000:0000:063f:72ff:fed0:cee6         v1  ens6f0np0
mlx5_0  1   1   fe80:0000:0000:0000:063f:72ff:fed0:cee6         v2  ens6f0np0
mlx5_0  1   2   0000:0000:0000:0000:0000:ffff:ac51:000a 172.81.0.10     v1  ens6f0np0
mlx5_0  1   3   0000:0000:0000:0000:0000:ffff:ac51:000a 172.81.0.10     v2  ens6f0np0
mlx5_0  1   4   fd00:0081:0000:0000:0172:0081:0000:0010         v1  ens6f0np0
mlx5_0  1   5   fd00:0081:0000:0000:0172:0081:0000:0010         v2  ens6f0np0
mlx5_1  1   0   fe80:0000:0000:0000:063f:72ff:fed0:cee7         v1  ens6f1np1
mlx5_1  1   1   fe80:0000:0000:0000:063f:72ff:fed0:cee7         v2  ens6f1np1
mlx5_1  1   10  fd00:0090:0000:0000:0000:0000:0000:0010         v1  ens6f1np1.90
mlx5_1  1   11  fd00:0090:0000:0000:0000:0000:0000:0010         v2  ens6f1np1.90
mlx5_1  1   2   0000:0000:0000:0000:0000:ffff:ac52:000a 172.82.0.10     v1  ens6f1np1
mlx5_1  1   3   0000:0000:0000:0000:0000:ffff:ac52:000a 172.82.0.10     v2  ens6f1np1
mlx5_1  1   4   fd00:0082:0000:0000:0172:0082:0000:0010         v1  ens6f1np1
mlx5_1  1   5   fd00:0082:0000:0000:0172:0082:0000:0010         v2  ens6f1np1
mlx5_1  1   6   fe80:0000:0000:0000:063f:72ff:fed0:cee7         v1  ens6f1np1.90
mlx5_1  1   7   fe80:0000:0000:0000:063f:72ff:fed0:cee7         v2  ens6f1np1.90
mlx5_1  1   8   0000:0000:0000:0000:0000:ffff:ac5a:000a 172.90.0.10     v1  ens6f1np1.90
mlx5_1  1   9   0000:0000:0000:0000:0000:ffff:ac5a:000a 172.90.0.10     v2  ens6f1np1.90
mlx5_2  1   0   fe80:0000:0000:0000:9888:49ff:fed9:428f         v1  ens6f0v0
mlx5_2  1   1   fe80:0000:0000:0000:9888:49ff:fed9:428f         v2  ens6f0v0
mlx5_3  1   0   fe80:0000:0000:0000:408d:07ff:feb3:0a9b         v1  ens6f0v1
mlx5_3  1   1   fe80:0000:0000:0000:408d:07ff:feb3:0a9b         v2  ens6f0v1
mlx5_4  1   0   fe80:0000:0000:0000:14ab:adff:fef9:16d7         v1  ens6f0v2
mlx5_4  1   1   fe80:0000:0000:0000:14ab:adff:fef9:16d7         v2  ens6f0v2
mlx5_5  1   0   fe80:0000:0000:0000:0891:f4ff:febc:46e2         v1  ens6f0v3
mlx5_5  1   1   fe80:0000:0000:0000:0891:f4ff:febc:46e2         v2  ens6f0v3
n_gids_found=26

# uname -a
Linux 10-20-1-10 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

I could succeed to run sockperf between two hosts all the times

on host 172.81.0.20

# sockperf sr --tcp -i 172.81.0.20 -p 15000
sockperf: == version #3.7-no.git ==
sockperf: [SERVER] listen on:
[ 0] IP = 172.81.0.20     PORT = 15000 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 2797496] using recvfrom() to block on socket(s)

on client host 172.81.0.10

# sockperf pp --tcp -i 172.81.0.20 -p 15000 -t 1
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 172.81.0.20     PORT = 15000 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=40500; ReceivedMessages=40499
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=23199; ReceivedMessages=23199
sockperf: ====> avg-latency=11.813 (std-dev=1.441, mean-ad=0.623, median-ad=0.487, siqr=0.337, cv=0.122, std-error=0.009, 99.0% ci=[11.789, 11.837])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 11.813 usec
sockperf: Total 23199 observations; each percentile contains 231.99 observations
sockperf: ---> <MAX> observation =  107.870
sockperf: ---> percentile 99.999 =  107.870
sockperf: ---> percentile 99.990 =   60.648
sockperf: ---> percentile 99.900 =   21.000
sockperf: ---> percentile 99.000 =   17.315
sockperf: ---> percentile 90.000 =   12.599
sockperf: ---> percentile 75.000 =   11.947
sockperf: ---> percentile 50.000 =   11.564
sockperf: ---> percentile 25.000 =   11.272
sockperf: ---> <MIN> observation =   10.458

but I failed to run with libvma sometimes

on host 172.81.0.20

# LD_PRELOAD=libvma.so sockperf sr --tcp -i 172.81.0.20 -p 15000
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: VMA_VERSION: 9.8.31-0 Development Snapshot built on Oct 10 2023 11:31:55 -*- DEBUG -*-
 VMA INFO: Cmd Line: sockperf sr --tcp -i 172.81.0.20 -p 15000
 VMA INFO: Current Time: Tue Oct 10 12:05:15 2023
 VMA INFO: Pid: 2813781
 VMA INFO: OFED Version: MLNX_OFED_LINUX-23.07-0.5.1.2:
 VMA INFO: Architecture: x86_64
 VMA INFO: Node: 10-20-1-20
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
 VMA INFO: ---------------------------------------------------------------------------
^@ VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.81.0.110    netmask: 255.255.255.255 dev: veth9878877221a                      table :500        scope 253 type  1 index 43 scope 253 type  1 index 43
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.81.0.119    netmask: 255.255.255.255 dev: vethd861c2e0cf5                      table :500        scope 253 type  1 index 34 scope 253 type  1 index 34
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.82.0.104    netmask: 255.255.255.255 dev: vethf917a4f52ae                      table :500        scope 253 type  1 index 42 scope 253 type  1 index 42
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.82.0.114    netmask: 255.255.255.255 dev: cali21b37a164ee                      table :500        scope 253 type  1 index 40 scope 253 type  1 index 40
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.90.0.108    netmask: 255.255.255.255 dev: veth9810b9fa995                      table :500        scope 253 type  1 index 44 scope 253 type  1 index 44
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.240.0    netmask: 255.255.255.192 dev:                            table :main       scope   0 type  6 index  0 scope   0 type  6 index  0
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.240.4    netmask: 255.255.255.255 dev: calic5e25250998                      table :main       scope 253 type  1 index 41 scope 253 type  1 index 41
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.240.37   netmask: 255.255.255.255 dev: cali21b37a164ee                      table :main       scope 253 type  1 index 40 scope 253 type  1 index 40
sockperf: == version #3.7-no.git ==
sockperf: [SERVER] listen on:
[ 0] IP = 172.81.0.20     PORT = 15000 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 2813781] using recvfrom() to block on socket(s)

on client host 172.81.0.10

# LD_PRELOAD=libvma.so sockperf pp --tcp -i 172.81.0.20 -p 15000 -t 1
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: VMA_VERSION: 9.8.31-1 Release built on Jul 10 2023 11:42:20
 VMA INFO: Cmd Line: sockperf pp --tcp -i 172.81.0.20 -p 15000 -t 1
 VMA INFO: OFED Version: MLNX_OFED_LINUX-23.07-0.5.1.2:
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
 VMA INFO: ---------------------------------------------------------------------------
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.81.0.100    netmask: 255.255.255.255 dev: veth29f76130861                      table :500        scope 253 type  1 index 20 scope 253 type  1 index 20
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.81.0.120    netmask: 255.255.255.255 dev: veth35695ee5c1e                      table :500        scope 253 type  1 index 17 scope 253 type  1 index 17
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.82.0.111    netmask: 255.255.255.255 dev: veth47f77b93392                      table :500        scope 253 type  1 index 22 scope 253 type  1 index 22
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.82.0.119    netmask: 255.255.255.255 dev: calic0d6e116972                      table :500        scope 253 type  1 index 18 scope 253 type  1 index 18
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.90.0.100    netmask: 255.255.255.255 dev: veth4dd7b95a373                      table :500        scope 253 type  1 index 21 scope 253 type  1 index 21
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.128   netmask: 255.255.255.192 dev:                            table :main       scope   0 type  6 index  0 scope   0 type  6 index  0
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.131   netmask: 255.255.255.255 dev: cali00a8163a6e5                      table :main       scope 253 type  1 index 32 scope 253 type  1 index 32
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.146   netmask: 255.255.255.255 dev: cali7cef15e86e8                      table :main       scope 253 type  1 index 33 scope 253 type  1 index 33
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.164   netmask: 255.255.255.255 dev: calic0d6e116972                      table :main       scope 253 type  1 index 18 scope 253 type  1 index 18
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.182   netmask: 255.255.255.255 dev: cali2f01bce650e                      table :main       scope 253 type  1 index 31 scope 253 type  1 index 31
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.183   netmask: 255.255.255.255 dev: calid7cf868faf9                      table :main       scope 253 type  1 index 19 scope 253 type  1 index 19
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.184   netmask: 255.255.255.255 dev: cali92710158f24                      table :main       scope 253 type  1 index 35 scope 253 type  1 index 35
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.185   netmask: 255.255.255.255 dev: calie84e04abcc4                      table :main       scope 253 type  1 index 34 scope 253 type  1 index 34
sockperf: == version #3.10-no.git ==
sockperf: ERROR: Can`t connect socket (errno=111 Connection refused)

and I succeed to run with libvma sometimes

# LD_PRELOAD=libvma.so sockperf pp --tcp -i 172.81.0.20 -p 15000 -t 1
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: VMA_VERSION: 9.8.31-1 Release built on Jul 10 2023 11:42:20
 VMA INFO: Cmd Line: sockperf pp --tcp -i 172.81.0.20 -p 15000 -t 1
 VMA INFO: OFED Version: MLNX_OFED_LINUX-23.07-0.5.1.2:
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
 VMA INFO: ---------------------------------------------------------------------------
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.81.0.100    netmask: 255.255.255.255 dev: veth29f76130861                      table :500        scope 253 type  1 index 20 scope 253 type  1 index 20
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.81.0.120    netmask: 255.255.255.255 dev: veth35695ee5c1e                      table :500        scope 253 type  1 index 17 scope 253 type  1 index 17
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.82.0.111    netmask: 255.255.255.255 dev: veth47f77b93392                      table :500        scope 253 type  1 index 22 scope 253 type  1 index 22
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.82.0.119    netmask: 255.255.255.255 dev: calic0d6e116972                      table :500        scope 253 type  1 index 18 scope 253 type  1 index 18
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.90.0.100    netmask: 255.255.255.255 dev: veth4dd7b95a373                      table :500        scope 253 type  1 index 21 scope 253 type  1 index 21
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.128   netmask: 255.255.255.192 dev:                            table :main       scope   0 type  6 index  0 scope   0 type  6 index  0
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.131   netmask: 255.255.255.255 dev: cali00a8163a6e5                      table :main       scope 253 type  1 index 32 scope 253 type  1 index 32
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.146   netmask: 255.255.255.255 dev: cali7cef15e86e8                      table :main       scope 253 type  1 index 33 scope 253 type  1 index 33
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.164   netmask: 255.255.255.255 dev: calic0d6e116972                      table :main       scope 253 type  1 index 18 scope 253 type  1 index 18
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.182   netmask: 255.255.255.255 dev: cali2f01bce650e                      table :main       scope 253 type  1 index 31 scope 253 type  1 index 31
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.183   netmask: 255.255.255.255 dev: calid7cf868faf9                      table :main       scope 253 type  1 index 19 scope 253 type  1 index 19
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.184   netmask: 255.255.255.255 dev: cali92710158f24                      table :main       scope 253 type  1 index 35 scope 253 type  1 index 35
 VMA WARNING: rtm:170:rt_mgr_update_source_ip() could not figure out source ip for rtv = dst: 172.21.32.185   netmask: 255.255.255.255 dev: calie84e04abcc4                      table :main       scope 253 type  1 index 34 scope 253 type  1 index 34
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 172.81.0.20     PORT = 15000 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=103730; ReceivedMessages=103729
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=57262; ReceivedMessages=57262
sockperf: ====> avg-latency=4.778 (std-dev=1.034, mean-ad=0.218, median-ad=0.076, siqr=0.051, cv=0.216, std-error=0.004, 99.0% ci=[4.767, 4.789])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 4.778 usec
sockperf: Total 57262 observations; each percentile contains 572.62 observations
sockperf: ---> <MAX> observation =  132.789
sockperf: ---> percentile 99.999 =  102.016
sockperf: ---> percentile 99.990 =   13.302
sockperf: ---> percentile 99.900 =   12.760
sockperf: ---> percentile 99.000 =    9.170
sockperf: ---> percentile 90.000 =    4.808
sockperf: ---> percentile 75.000 =    4.720
sockperf: ---> percentile 50.000 =    4.673
sockperf: ---> percentile 25.000 =    4.616
sockperf: ---> <MIN> observation =    4.278

detailed log is attached for the client host fail-client-log.txt

In brief, when use libvma, it succeed sometimes and fail sometime with same command but when does not use libvma, it succeed all the time