RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 227 forks source link

[FeatureRequest] #455

Open cpj18234088063 opened 2 years ago

cpj18234088063 commented 2 years ago

Hello, when I use likwid-mpirun, I encounter the following situation: I have two machines(A and B), each machine is equipped with two CPUs, but machine A has 24 cores (12 2) and machine B has 20 cores (10 2). For symmetry, I'm going to run 20 threads on machines A and B respectively. I use [S0:0- 9@S1 : 0-9] to set the pin, but [S0:0- 9@S1 : 0-9] is parsed in likwid-mpirun as: [- C 0,1,2,3,4,5,6,7,8,9,12,13,14,15,16,17,18,19,20,21], which makes machine B output : "CPU 20 / 21 not in domain n". In other words, the parsing of [S0:0- 9@S1 : 0-9] is different in likwid-mpirun and likwid-perfctr. Consider making likwid-mpirun support [S0:0- 9@S1 : 0-9] 。For example: [table.insert(cmd, table.concat(cpuexprs[i], ","))] in likwid-mpirun file.

Thank the author for providing such a good work.

TomTheBear commented 2 years ago

Hi, thanks for your request.

The problem is that likwid-mpirun does not know the topology of remote nodes. It assumes that the local topology fits for the other remote nodes as well. The under-the-hood likwid-perfctr (or likwid-pin) calls are generated on the local node. Moreover, it does resolve the pinning settings like S0:0-9 and uses the actual cpu list for further processing. So your request would be to resolve the cpu list internally but keep the original pinning strings for the under-the-hood calls. Do I get that right?

There might be problems later at evaluation. likwid-mpirun reads the output files and relies on the calculations it has done before (which node-hwthread pairs belongs to which MPI rank). This result collection has to be made more flexible and all required inputs for evaluation need to be provided by the output files.