RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

[BUG] Problem with likwid-mpirun #502

Closed JanLJL closed 1 year ago

JanLJL commented 1 year ago

Describe the bug likwid fails when running likwid-mpirun on the M1.

To Reproduce On a M1 (I tried with the M1 from Apple Studio), do:

$ likwid-mpirun -mpi openmpi -np 16 -pin S1:0-3@S2:0-3@S4:0-3@S5:0-3 -d hostname
DEBUG: Executable given on commandline: /usr/sbin/hostname
WARN: Cannot extract OpenMP vendor from executable or commandline, assuming no OpenMP
sh: line 1: scontrol: command not found
DEBUG: Reading hostfile from batch system
Available hosts for scheduling:
Host                    Slots   MaxSlots        Interface
DEBUG: Evaluated CPU expressions: [[2,3,4,5,6,7,8,9,12,13,14,15,16,17,18,19]]
DEBUG: Assign 16 processes with 1 per node and 16 threads per process to 0 hosts
WARN: Only 0 processes out of 16 can be assigned, running with 0 processes
DEBUG: Scheduling on hosts:
/apps/modules/likwid-m1/bin/likwid-lua: /apps/modules/likwid-m1/bin/likwid-mpirun:1465: attempt to perform 'n%0'
stack traceback:
        /apps/modules/likwid-m1/bin/likwid-mpirun:1465: in local 'writeWrapperScript'
        /apps/modules/likwid-m1/bin/likwid-mpirun:2514: in main chunk
        [C]: in ?

I do explicit pinning for only accessing the Firestorm nodes, but there is no difference without the -pin parameter.

$ likwid-mpirun --version
likwid-mpirun -- Version 5.2.0 (commit: 233ab943543480cd46058b34616c174198ba0459)
$ module li
Currently Loaded Modulefiles:
 1) likwid/5.2.2
TomTheBear commented 1 year ago

This is not an issue only on Apple M1, it is caused when the SLURM commands required by likwid-mpirun (scontrol and srun) are not available in the PATH.

Note: Removed "on Apple M1" from issue title

TomTheBear commented 1 year ago

This was fixed on the Apple M1 the user uses. The SLURM package was installed in a location outside of $PATH, now it is in /usr/local.