PySlurm / pyslurm

Python Interface to Slurm
https://pyslurm.github.io
GNU General Public License v2.0
466 stars 116 forks source link

Job warn signals not passed to job #341

Closed florian-burger closed 2 days ago

florian-burger commented 1 week ago

Details

Issue

Slurm job warn signals are not passed over to job submitted by pyslurm and are silently ignored.

I am trying to send signal 12 (SIGUSR2) to a running job for triggering a cleanup 60 seconds before the job ends. I can manage this using sbatch option --signal=B:12@60 from command line or via #SBATCH in a job script without problems and I see a corresponding log message from slurmctld 60 seconds before the job is ending.

Trying to do the same using pyslurm by adding a signal key to the JobSubmitDescription seems to fail without error.

Test code:

import pyslurm as ps

job_description = ps.JobSubmitDescription(
            name="signal_test",
            nodes=1,
            ntasks=2,
            time_limit="00-00:03:00",
            signal="SIGUSR2@60"
        )
job_id = job_description.submit()
print(job_id)

With the printed job ID it can then be checked in the slurmctld log if the signal was sent: grep -i signal /var/log/slurm/slurmctld.log | grep "JobId=<JOB_ID>"

tazend commented 1 week ago

Hi @florian-burger,

I will make some tests and check why it fails.

tazend commented 4 days ago

Hi,

sorry for the wait. The problem was simply that I forgot to actually call the function that handles setting up signals for the job... I will push the fix soon upstream, in the meantime you can use this temporary-branch for 22.5.x which contains the fix: https://github.com/tazend/pyslurm/tree/fix/job-signal-22.5.x

tazend commented 2 days ago

Hi,

the fix has now been merged. I would recommend directly using the 22.5.x branch: https://github.com/PySlurm/pyslurm/tree/22.5.x

florian-burger commented 19 hours ago

Hi @tazend,

thanks a lot for fixing this so quickly! Much appreciated.