a-paranjape / sahyadri-sandbox

Sandbox for testing codes and scripts related to Sahyadri simulations at IUCAA/TIFR/IISER-Pune/NCRA
0 stars 0 forks source link

Running gadget on multiple nodes #10

Closed SaeeDhawalikar closed 5 months ago

SaeeDhawalikar commented 5 months ago

1. Change in MaxMemSize In the gadget parameter template file, "MaxMemSize" needs to be changed based on the number of particles and cpus. Currently it is set to 10000 MB, which gives an error when 32 cpus per node are used on pegasus (enough memory is not available). Approximately, MaxMemSize must be at least 0.45 KB*Nparticles/Ncpus . For 256^3 particles, this is around 250 MB while using a single node (32 cpus), and will be less for multiple nodes. MaxMemSize= 5000 is well within the pegasus memory size as well as much greater than the minimum memory required for gadget with Nparticle=256^3 and a single node.

2. Using multiple nodes The current run_gadget.sh fails to use multiple nodes. The machinefile has to be specified while submitting the mpi job:

mpiexec -np $NCPU_TOT $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE

has to be replaced by mpirun -np $NCPU_TOT -machinefile $PBS_NODEFILE $CODE_HOME/code/Gadget-4/mesh$NMESH-NGenIC/Gadget4 $GADGET2_CONFIG_FILE

3. Adding library paths to .bash_profile and .bashrc files On pegasus, when multiple nodes are used, the library paths have to be added to .bashrc and .bash_profile separately, only loading the modules is not enough for multi-node jobs.

a-paranjape commented 5 months ago
  1. and 2. were modified as suggested.
  2. was not needed from my home area and libraries did not have to be installed manually. we should track down whether it's related to some path declaration issue when using binaries compiled by another user.

for now the jobs are being successfully submitted using multiple nodes, hence closing this issue.