gms-bbg / gamess-issues

GAMESS issue tracking
7 stars 1 forks source link

Rungms is incompatible with SLURM #39

Closed dishu1 closed 3 years ago

dishu1 commented 3 years ago

Hello,

I'm attempting to run the most recent version of GAMESS. I compiled it with Intel compilers + MKL 19.0.5 and iMPI 2019.7.217 for some 64-bit Linux, on Pitzer cluster of OSC and I'm not going to use sockets communication. SLURM is used there since recently but I don't know the particular version since it doesn't pop up in the modules list.

However, neither rungms nor rungms-dev allow to work with SLURM. There's a patch for rungms making it compatible with SLURM, but it's probably slightly outdated and doesn't work. I tried working on rungms-dev, but there's the same situation, it just doesn't work though has SLURM-specific environment variables. You can see all tweaks I had to apply to rungms-dev (it wasn't even able to generate the procfile out of the box, lines initially used for it were wrong) and you can also see that it still doesn't work. Please, help me to make the script work as supposed to. The error I'm concerned about at the moment is mpiexec error seen in slurm-2539682.out file in the archive.

Also, let me know if it makes sense to run GAMESS in mixed communication mode and when it does.

Gamess_test.zip

saromleang commented 3 years ago

Can you give me the output of the following:

  1. Log into a node with slurm for an interactive session
  2. Using the original rungms-dev to run exam01 in parallel: ./rungms-dev exam01 <version> 4 4 &> exam01.log

Provide the output.

dishu1 commented 3 years ago

Done. I gave the appropriate GMSPATH and run it in the interactive session. exam01.log

saromleang commented 3 years ago

Can you build and use the non-threaded version of GAMESS.

Set GMS_OPENMP to false in install.info and Makefile

dishu1 commented 3 years ago

It didn't change anything. exam01.log

saromleang commented 3 years ago

Because you are getting a segfault at the mpi launch I would suggest trying to building with a different MPI.

Options:

The idea is that you need to get a working binary to where:

./rungms-dev exam01 <version> 4 4

Does not segfault.

dishu1 commented 3 years ago

The problem is not related to this version of iMPI or to the compiler. The problems are:

  1. mpiexec.hydra is not needed when SLURM is used, srun $GMSPATH/gamess.$VERNO.x < /dev/null works fine. There's no need to create proc- or nodefiles.
  2. To run on multiple nodes one should not use $SLURM_TASKS_PER_NODE as the rungms argument. If one requests 4 nodes with 48 cores each this produces 48(x4) or something of that sort instead of just the number of nodes, that causes an error. rungms $SLURM_NTASKS > out.log

Have my working rungms for the future work on improving the compatibility with SLURM. rungms-dev.txt

saromleang commented 3 years ago

In your example if you wanted to run across 4 nodes with 48 cores per node your call to rungms* would be:

./rungms <input> <version> 192 48

When you use:

./rungms-dev test impi $SLURM_NTASKS $SLURM_TASKS_PER_NODE > test.log

What values are being passed to $SLURM_NTASKS and $SLURM_TASKS_PER_NODE?

dishu1 commented 3 years ago

I would prefer to give these values only through #SBATCH --nodes=8 --ntasks-per-node=48. Not through rungms itself. It would simply be convenient.

$SLURM_NTASKS looks fine, a problem appears when running on multiple nodes, for example 2: $SLURM_TASKS_PER_NODE = 48(x2). That causes an error in the rungms itself.

Anyway, all these settings are not needed because srun already has all of them. There's no need in creating the nodefile, no need to submit number of processes and number of processes per node to the rungms script. There's no need to do all the setup as originally written in rungms. Just srun is fine.

saromleang commented 3 years ago

So what do you suggest then for users of rungms* that utilize other resource managers? (e.g., PBS/TORQUE, SGE, LSF)

dishu1 commented 3 years ago

Running GAMESS with SLURM is much easier than suggested in this script with lots of complications which aren't needed anymore. Besides, the 3-step MPI-kickoff procedure seems outdated and useless. There's no mpdboot coming with the recent iMPI to use this procedure. I would suggest to rewrite the script accordingly to remove everything unneeded.

samcom12 commented 5 months ago

Hi @dishu1

Can you provide your working rungms-dev script ?

Cheers Samir Shaikh