Open nikishe opened 1 month ago
AutoMD -i "desmond_setup_2-out.cms" -S OUC -t 100 -H "cpu" -G "gpu"
I think you could try to submit a slurm job to the computing node to run AutoMD with localhost.
" to run AutoMD with localhost."
Sorry finding this abit vague(but probably due to my understanding)
, can you show me an example, as I am reading it as submitting a job that kicks off a process, outside the scheduler
You just need to run a local AutoMD job in your slurm script.
AutoMD -i "desmond_setup_2-out.cms" -S OUC -t 100 -H "localhost" -G "localhost"
Meanwhile, don't forget to modify your host file to add the GPU to the queue localhost
. Such as:
name: localhost
gpgpu: 0, Tesla V100
gpgpu: 1, Tesla V100
gpgpu: 2, Tesla V100
gpgpu: 3, Tesla V100
You can refer to the previous issue: Installation](https://github.com/Wang-Lin-boop/AutoMD/issues/1#issuecomment-1983836184)
Wont this run the job outside the scheduler? I will give it a go, but I worry this will run ouutside the scheduler risking other peoples jobs on that node
No, you need to submit this slurm job to the scheduler via sbatch
.
Have you submitted a slurm job script? It seems like you don't use sbatch
to submit jobs very often. Please refer to slurm
The slurm job script
look like:
cat<<EOF > AutoMD.slurm
#! /bin/bash
#SBATCH --job-name=AutoMD
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu
#SBATCH --ntasks=6
#SBATCH --time=120:00:00
#SBATCH --output=AutoMD.out
#SBATCH --error=AutoMD.out
AutoMD -i "desmond_setup_2-out.cms" -S OUC -t 100 -H "localhost" -G "localhost"
EOF
sbatch --gpus=1 AutoMD.slurm
Thank you for your suggestions its all working now, Issue was
` -H "localhost" -G "localhost"`
Was under the impression H and G had to point to something that creates a job on slurm . in my case they were pointing to:
# 1 hour wall time, 40 tasks with default 1 cpu/task
name: batch-small
host: localhost
schrodinger: ${SCHRODINGER}
queue: SLURM2.1
qargs: --export=ALL --cpus-per-task=1 --mem-per-cpu=10GB --time=00:20:00 --partition=gpu-h100 --qos=gpu --gres=gpu:h100:1
tmpdir: /tmp
# 1 hour wall time, 40 tasks with default 1 cpu/task
name: batch-a100
host: localhost
schrodinger: ${SCHRODINGER}
queue: SLURM2.1
qargs: --export=ALL --cpus-per-task=1 --mem-per-cpu=10GB --time=00:20:00 --partition=gpu --qos=gpu --gres=gpu:1
tmpdir: /tmp
This and the use of both interactive and batch jobs led to my issues. I think a few scenarios in the readme might help future newbys. I am happy to do a pull request if you think it will be a good idea
The localhost
here means localhost in computing node, not the login node. This is equal to you using the queue information in the hosts file to have Desmond commit each stage to the compute node individually. Both are allowed by the scheduler.
Glad you solved the problem. It's a good idea to organize some of the issues that newbies might have.
Hey AutoMD team, we have it working now , It works through stage 1-10. From stage 3-10 as it will be using a GPU , it creates a slurm job for every stage , is there a way to tell it to run all 7 gpu stages on one slurm job as I am getting penalised by the schedulers fair use policy; and it also means a lot of time is lost waiting for resources. Any advice?