AgnostiqHQ / covalent-slurm-plugin

Executor plugin interfacing Covalent with Slurm
https://covalent.xyz
Apache License 2.0
27 stars 6 forks source link

`prerun_commands` don't show up in the Slurm jobscript file #93

Closed Andrew-S-Rosen closed 4 months ago

Andrew-S-Rosen commented 4 months ago

Environment

What is happening?

Specifying prerun_commands in the SlurmExecutor does not result in the commands appear in the Slurm job script.

How can we reproduce the issue?

Run a simple toy example with the prerun_commands keyword argument.

I used the example below:

import covalent as ct

executor = ct.executor.SlurmExecutor(
    username="rosen",
    address="perlmutter-p1.nersc.gov",
    ssh_key_file="/home/rosen/.ssh/nersc",
    cert_file="/home/rosen/.ssh/nersc-cert.pub",
    conda_env="covalent",
    options={
        "nodes": f"{n_nodes}",
        "qos": "debug",
        "constraint": "cpu",
        "account": "matgen",
        "job-name": "quacc",
        "time": "00:30:00",
    },
    remote_workdir="/pscratch/sd/r/rosen/quacc/",
    create_unique_workdir=True,
    use_srun=False,
    prerun_commands=[
        "module load vasp/6.4.1-cpu",
        f"export QUACC_VASP_PARALLEL_CMD='{vasp_parallel_cmd}'",
    ],
)

What should happen?

The prerun_commands should appear at the bottom of the job script, but they do not. The following was present for me:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --qos=debug
#SBATCH --constraint=cpu
#SBATCH --account=matgen
#SBATCH --job-name=quacc
#SBATCH --time=00:30:00
#SBATCH --parsable
#SBATCH --output=/pscratch/sd/r/rosen/quacc/94d1d3a5-8c42-4af3-b6f1-b6ee8126bea0/node_0/stdout-94d1d3a5-8c42-4af3-b6f1-b6ee8126bea0-0.log
#SBATCH --error=/pscratch/sd/r/rosen/quacc/94d1d3a5-8c42-4af3-b6f1-b6ee8126bea0/node_0/stderr-94d1d3a5-8c42-4af3-b6f1-b6ee8126bea0-0.log
#SBATCH --chdir=/pscratch/sd/r/rosen/quacc/94d1d3a5-8c42-4af3-b6f1-b6ee8126bea0/node_0

source $HOME/.bashrc

            conda activate covalent
            retval=$?
            if [ $retval -ne 0 ] ; then
                >&2 echo "Conda environment covalent is not present on the compute node. "                "Please create the environment and try again."
                exit 99
            fi

remote_py_version=$(python -c "print('.'.join(map(str, __import__('sys').version_info[:2])))")
if [[ "3.10" != $remote_py_version ]] ; then
  >&2 echo "Python version mismatch. Please install Python 3.10 in the compute environment."
  exit 199
fi

python /pscratch/sd/r/rosen/quacc/script-94d1d3a5-8c42-4af3-b6f1-b6ee8126bea0-0.py

wait

Note how there are no prerun commands here.

I have not yet tried the postrun_commands.

Any suggestions?

No response

Andrew-S-Rosen commented 4 months ago

I am unable to reproduce this issue with main.