Closed lscarton closed 5 months ago
Hi @lscarton
I see you are using the pyslurm.job
class. Note that this class is deprecated and isn't really maintained anymore so it might still contain some bugs like the one you are hitting.
My suggestion would be to use the pyslurm.JobSubmitDescription (even though it says the docs are for 23.2, there isn't really a lot that changed with 23.11, so it's still pretty accurate), which is a new class created specifically to cover the Job submission.
You could do something like this then:
import pyslurm
def main():
script = """source /home/$USER/.bashrc
source activate pytorchenv3.9
nvidia-smi
"""
job_desc = pyslurm.JobSubmitDescription(
gpus = 1,
ntasks_per_node = 32,
name = "pyslurm_gpu_test",
standard_error = "slurm/pyslurm-test-%j.err",
partition = "gpu",
standard_output = "slurm/pyslurm-test-%j.out",
memory_per_node = "180G",
time_limit = "00:05:00",
script = script
)
job_id = job_desc.submit()
job = pyslurm.Job.load(job_id)
print(job.gres_per_node)
if __name__ == "__main__":
main()
You can also further verify that your job got a GPU allocated when checking with scontrol show job
Hi @tazend, I am very grateful for your guidance! it works amazingly!
I only had to add #!/bin/bash
at the beginning of the script and pay attention to the indentation of the script, which sometime was reporting sbatch: error: This does not look like a batch script. The first sbatch: error: line must start with #! followed by the path to an interpreter. sbatch: error: For instance: #!/bin/sh
.
It was passing the _validate_batch_script
as the script started with #!/bin/bash
, but i believe a pretty indentation was causing problems.
Thanks again to @tazend for the fantastic support and thanks to all the contributors to this amazing library.
Details
Issue
Let me first thank you for this amazing library.
When submitting a job using GPU, I am required to add
--gres=gpu
.Unfortunately, when I use PySlurm I do not get GPU, whereas when I use a bash script I get it. I include both the bash and python scripts and the relative output at the bottom. I have tried few way such as
'gres':'gpu, 'gres':'gpu:1', 'gres_per_node':'gpu' , ...
I have also check the namingGres=gpu:1(S:0)
usingscontrol show node nodename
.Could you please guide me, as probably I am missing something trivial.
Thank you so much for your support and guidance.
bash code:
outputs:
Python script:
outputs: