Running gadget on multiple nodes

1. Change in MaxMemSize In the gadget parameter template file, "MaxMemSize" needs to be changed based on the number of particles and cpus. Currently it is set to 10000 MB, which gives an error when 32 cpus per node are used on pegasus (enough memory is not available). Approximately, MaxMemSize must be at least 0.45 KB*Nparticles/Ncpus . For 256^3 particles, this is around 250 MB while using a single node (32 cpus), and will be less for multiple nodes. MaxMemSize= 5000 is well within the pegasus memory size as well as much greater than the minimum memory required for gadget with Nparticle=256^3 and a single node.

2. Using multiple nodes The current run_gadget.sh fails to use multiple nodes. The machinefile has to be specified while submitting the mpi job:

mpiexec -np $NCPU_TOT $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE

has to be replaced by mpirun -np $NCPU_TOT -machinefile $PBS_NODEFILE $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE

3. Adding library paths to .bash_profile and .bashrc files On pegasus, when multiple nodes are used, the library paths have to be added to .bashrc and .bash_profile separately, only loading the modules is not enough for multi-node jobs.

a-paranjape / sahyadri-sandbox

Running gadget on multiple nodes #10