Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
582 stars 153 forks source link

VMA_INTERNAL_THREAD_AFFINITY doesn't run the vma internal thread on the specified core #1080

Open boranby opened 4 months ago

boranby commented 4 months ago

Subject

running with: LD_PRELOAD=libvma.so VMA_SPEC=latency VMA_INTERNAL_THREAD_AFFINITY=2 ./app However, VMA internal thread run on the same core with the application. I tried to use bit-mask approach to set the affinity it also didn't work.

Issue type

Configuration:

Actual behavior:

VMA_INTERNAL_THREAD_AFFINITY=2 doesn't have an impact on the core affinity of the vma process. It's running on the same core with the application thread. Causing context switches and hanging which impact the latency.

Expected behavior:

The recommended configuration is to run VMA internal thread on a different core than the application but on the same NUMA node. To achieve this VMA_INTERNAL_THREAD_AFFINITY should work as expected to pin the vma process to the core we want it to be.

Steps to reproduce:

igor-ivanov commented 3 months ago

Could you please provide top of output using VMA_TRACELEVEL=4 VMA_SPEC=latency and VMA_TRACELEVEL=4 VMA_SPEC=latency VMA_INTERNAL_THREAD_AFFINITY=2. It should include list of VMA parameters used during the launch. See example at https://github.com/Mellanox/libvma/blob/master/README#L86. Line related VMA_INTERNAL_THREAD_AFFINITY should be enough. In addition top or htop output in addition in both cases.

boranby commented 3 months ago

Hi Igor, thanks for your response. You can find the details below. If you need anything else, I can provide you with other information.

VMA INFO: Internal Thread Affinity 2 [VMA_INTERNAL_THREAD_AFFINITY]

Running top

top - 21:42:47 up 12 min,  3 users,  load average: 2.51, 1.72, 1.03
Tasks: 497 total,   2 running, 494 sleeping,   0 stopped,   1 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 95.7 us,  4.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 192484.2 total,  57171.2 free, 133581.9 used,   2336.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  58902.2 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 22799 sonictr+  20   0 2583740  71152   6528 R  68.1   0.0   1:49.07 sonic
 22917 sonictr+  20   0  226420   4736   3456 R   0.3   0.0   0:00.10 top
      1 root      20   0  174948  18680  11040 S   0.0   0.0   0:01.20 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri
     10 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq

VMA INFO: Internal Thread Affinity 0 [VMA_INTERNAL_THREAD_AFFINITY]

Running top

top - 21:44:25 up 14 min,  3 users,  load average: 2.61, 2.02, 1.22
Tasks: 497 total,   2 running, 494 sleeping,   0 stopped,   1 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.3 hi,  0.3 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 95.7 us,  4.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 192484.2 total,  57174.4 free, 133577.6 used,   2337.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  58906.5 avail Mem

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
23011 sonictr+  20   0 2583860  71024   6400 R 102.3   0.0   0:27.08 sonic
23041 sonictr+  20   0  226320   4736   3456 R   0.3   0.0   0:00.04 top
      1 root      20   0  174948  18680  11040 S   0.0   0.0   0:01.20 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri
     10 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
igor-ivanov commented 3 months ago

@pasis do you have explanation?

boranby commented 2 months ago

Hi, @igor-ivanov @pasis is there any update on this issue? Thanks for your help.

boranby commented 1 month ago

Is there any other way to find a solution or get support from libvma or Mellanox team?