Closed antonelepfl closed 5 years ago
I already use this on Jureca. I use 24 CPUsPerNode. Maybe 68 is out of range. I think 24 is the maximum you can use, but I am not sure.
the CPUsPerNode is translated to --tasks-per-node but there is a valid range of 0-48 (corresponding to the actual cpus per node)
you can see the valid resource ranges with a GET to the /rest/core/factories/default_target_system_factory endpoint
Ok with 48 works. 1) Thanks for that endpoint Bernd 2) CPUsPerNode get's translated to --ntasks-per-node .
3) In the past I was using up to 68 tasks per node in Jureca booster when I launch the srun command
DEBUG:__main__:cmd: ['srun', '--cpus-per-task=1', '--ntasks-per-node=68', '--ntasks=680', '--nodes', '10', '/p/project/cvsk25/vsk2514/HBP/jureca-booster/21-12-2018/install/install/linux-centos7-x86_64/intel-18.0.2/neurodamus-hippocampus-i23mxn/bin/special', '-NFRAME', '1024', '/p/project/cvsk25/vsk2514/HBP/jureca-booster/21-12-2018/install/install/linux-centos7-x86_64/intel-18.0.2/neurodamus-hippocampus-i23mxn/lib/hoclib/init.hoc', '-mpi']
...
and I get that number of process for running the job (stdout)
numprocs=680
And that works.
In the bss_submit
#!/bin/bash
#SBATCH --job-name=Microcircuit
#SBATCH --partition=booster
#SBATCH --account=vsk25
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1
#SBATCH --time=180
#SBATCH --output=/p/scratch/cvsk25/unicore-jobs//a29165bb-da5a-424d-88ac-a4b951cf6e7c//stdout
#SBATCH --error=/p/scratch/cvsk25/unicore-jobs//a29165bb-da5a-424d-88ac-a4b951cf6e7c//stderr
#SBATCH --workdir=/p/scratch/cvsk25/unicore-jobs//a29165bb-da5a-424d-88ac-a4b951cf6e7c/
umask 77
/p/scratch/cvsk25/unicore-jobs//a29165bb-da5a-424d-88ac-a4b951cf6e7c//UNICORE_Job_1558016359193
So I'm a bit confused. If we are able to use 68 and we are using only 48 we are 'wasting' resources right?
You can see that old job in /p/scratch/cvsk25/unicore-jobs/a29165bb-da5a-424d-88ac-a4b951cf6e7c
True, the current resource handling mechanism is not well suited to heterogenous systems like jureca (with 48 cores per node) / jureca-booster (68 cores per node) This will improve with the next major release of UNICORE. As a workaround for now we can increase the limit to 68 (booster).
If you prefer I can use 48 for the time being but then please let us know status about the new release to put it back to 68 for booster.
the limit is now set to 68
Ok thank you. I would suggest to keep this issue open until the new version is deployed. What do you think guys?
I'd close it, since the problem at hand is solved.
Ok thank you
Hi @BerndSchuller , I would like to pass the number of CPUsPerNode so it's translated to
#SBATCH --ntasks-per-node=...
in the bsssubmit.. Piz-daint have already implemented this functionality.Currently I get: