1. Change in MaxMemSize
In the gadget parameter template file, "MaxMemSize" needs to be changed based on the number of particles and cpus.
Currently it is set to 10000 MB, which gives an error when 32 cpus per node are used on pegasus (enough memory is not available).
Approximately, MaxMemSize must be at least 0.45 KB*Nparticles/Ncpus . For 256^3 particles, this is around 250 MB while using a single node (32 cpus), and will be less for multiple nodes. MaxMemSize= 5000 is well within the pegasus memory size as well as much greater than the minimum memory required for gadget with Nparticle=256^3 and a single node.
2. Using multiple nodes
The current run_gadget.sh fails to use multiple nodes. The machinefile has to be specified while submitting the mpi job:
has to be replaced by
mpirun -np $NCPU_TOT -machinefile $PBS_NODEFILE $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE
3. Adding library paths to .bash_profile and .bashrc files
On pegasus, when multiple nodes are used, the library paths have to be added to .bashrc and .bash_profile separately, only loading the modules is not enough for multi-node jobs.
was not needed from my home area and libraries did not have to be installed manually. we should track down whether it's related to some path declaration issue when using binaries compiled by another user.
for now the jobs are being successfully submitted using multiple nodes, hence closing this issue.
1. Change in MaxMemSize In the gadget parameter template file, "MaxMemSize" needs to be changed based on the number of particles and cpus. Currently it is set to 10000 MB, which gives an error when 32 cpus per node are used on pegasus (enough memory is not available). Approximately, MaxMemSize must be at least 0.45 KB*Nparticles/Ncpus . For 256^3 particles, this is around 250 MB while using a single node (32 cpus), and will be less for multiple nodes. MaxMemSize= 5000 is well within the pegasus memory size as well as much greater than the minimum memory required for gadget with Nparticle=256^3 and a single node.
2. Using multiple nodes The current run_gadget.sh fails to use multiple nodes. The machinefile has to be specified while submitting the mpi job:
mpiexec -np $NCPU_TOT $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE
has to be replaced by mpirun -np $NCPU_TOT -machinefile $PBS_NODEFILE $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE
3. Adding library paths to .bash_profile and .bashrc files On pegasus, when multiple nodes are used, the library paths have to be added to .bashrc and .bash_profile separately, only loading the modules is not enough for multi-node jobs.