Open boranby opened 4 months ago
Could you please provide top of output using VMA_TRACELEVEL=4 VMA_SPEC=latency
and VMA_TRACELEVEL=4 VMA_SPEC=latency VMA_INTERNAL_THREAD_AFFINITY=2
. It should include list of VMA parameters used during the launch. See example at https://github.com/Mellanox/libvma/blob/master/README#L86.
Line related VMA_INTERNAL_THREAD_AFFINITY
should be enough.
In addition top
or htop
output in addition in both cases.
Hi Igor, thanks for your response. You can find the details below. If you need anything else, I can provide you with other information.
VMA_TRACELEVEL=4 VMA_SPEC=latency VMA_INTERNAL_THREAD_AFFINITY=2
VMA INFO: Internal Thread Affinity 2 [VMA_INTERNAL_THREAD_AFFINITY]
Running top
top - 21:42:47 up 12 min, 3 users, load average: 2.51, 1.72, 1.03
Tasks: 497 total, 2 running, 494 sleeping, 0 stopped, 1 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 95.7 us, 4.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 192484.2 total, 57171.2 free, 133581.9 used, 2336.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 58902.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22799 sonictr+ 20 0 2583740 71152 6528 R 68.1 0.0 1:49.07 sonic
22917 sonictr+ 20 0 226420 4736 3456 R 0.3 0.0 0:00.10 top
1 root 20 0 174948 18680 11040 S 0.0 0.0 0:01.20 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 slub_flushwq
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri
10 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
VMA_TRACELEVEL=4 VMA_SPEC=latency
VMA INFO: Internal Thread Affinity 0 [VMA_INTERNAL_THREAD_AFFINITY]
Running top
top - 21:44:25 up 14 min, 3 users, load average: 2.61, 2.02, 1.22
Tasks: 497 total, 2 running, 494 sleeping, 0 stopped, 1 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 95.7 us, 4.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 192484.2 total, 57174.4 free, 133577.6 used, 2337.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 58906.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23011 sonictr+ 20 0 2583860 71024 6400 R 102.3 0.0 0:27.08 sonic
23041 sonictr+ 20 0 226320 4736 3456 R 0.3 0.0 0:00.04 top
1 root 20 0 174948 18680 11040 S 0.0 0.0 0:01.20 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 slub_flushwq
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri
10 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
@pasis do you have explanation?
Hi, @igor-ivanov @pasis is there any update on this issue? Thanks for your help.
Is there any other way to find a solution or get support from libvma or Mellanox team?
Subject
running with:
LD_PRELOAD=libvma.so VMA_SPEC=latency VMA_INTERNAL_THREAD_AFFINITY=2 ./app
However, VMA internal thread run on the same core with the application. I tried to use bit-mask approach to set the affinity it also didn't work.Issue type
Configuration:
Actual behavior:
VMA_INTERNAL_THREAD_AFFINITY=2
doesn't have an impact on the core affinity of the vma process. It's running on the same core with the application thread. Causing context switches and hanging which impact the latency.Expected behavior:
The recommended configuration is to run VMA internal thread on a different core than the application but on the same NUMA node. To achieve this VMA_INTERNAL_THREAD_AFFINITY should work as expected to pin the vma process to the core we want it to be.
Steps to reproduce: