Closed QuarkContainer closed 5 months ago
mpirun only detects 1 cpu slot therefore can't run tasks with n>1 (without oversubscribe).
Here is a comparison of native cpu topology and the native one:
native
Machine#0
Package#0
L3#0(6144KB)
L2#0(256KB)
L1d#0(32KB)
Core#0
PU#0
L2#1(256KB)
L1d#1(32KB)
Core#1
PU#1
L2#2(256KB)
L1d#2(32KB)
Core#2
PU#2
L2#3(256KB)
L1d#3(32KB)
Core#3
PU#3
*** 1 package(s)
*** Logical processor 0 has 3 caches totaling 6432KB
quark (cpus=4)
*** Objects at level 0
Index 0: Machine
*** Objects at level 1
Index 0: Package
*** Objects at level 2
Index 0: Core
*** Objects at level 3
Index 0: PU
Index 1: PU
*** Printing overall tree
Machine#0
Package#0
Core#0
PU#0
PU#1
*** 1 package(s)
*** Logical processor 0 has 0 caches totaling 0KB
permission issues (with quark):
/project $ ls -lh /sys/devices/system/cpu/
total 0
dr-xr-xr-x 1 root root 0 May 28 19:10 cpu0
dr-xr-xr-x 1 root root 0 May 28 19:10 cpu1
-r-------- 0 root root 0 May 28 19:10 online
-r-------- 0 root root 0 May 28 19:10 possible
-r-------- 0 root root 0 May 28 19:10 present
hwloc reads /sys/devices/system/cpu/online
but has no permission
core_cpus:
/sys/devices/system/cpu/cpu1/topology $ cat core_cpus
000003
(it should be 1, also the format is wrong)
@shrik3 I miss to add the /sys/devices/system/cpu/cpu%d/topology/thread_siblings. After add it, it works in my side.
So far, I copied all the value of the file from my local host. Let's discuss how to implement if it works in your side.
@shrik3 I miss to add the /sys/devices/system/cpu/cpu%d/topology/thread_siblings. After add it, it works in my side.
So far, I copied all the value of the file from my local host. Let's discuss how to implement if it works in your side.
I'll test later.
btw should we make the CPU reserved for the IO thread visible to the userspace? If I understand correctly, no other tasks should be scheduled on that CPU.
Let's have a meeting to discuss current vcpu thread allocation. Ket me share mire details for you
@shrik3 I updated the PR to retrieve the cpu information from host system. So the current cpu content should be corrected. For the cpu count, let's discuss later.
After implement /sys/devices/system/cpu, the mpirun still doesn't work.