When issuing scancel on this job, Slurm will send a SIGTERM to that particular process, which only kills Python. The Abaqus process will not die, and no cleanup afterwards will take place.
Suggestion:
Make sure the Python scripts generated by Jobbers listen to signals and tries to run "abaqus terminate ", something like this:
If that does not succeed within 5 minutes, run Popen.send_signal() or Popen.terminate() to kill the process. There may be leftovers that has been started by mpid, in that case better traverse the entire process tree or else processes started by mpid may be left orphaned.
Currently, an Abaqus job started by our templates generated by Jobbers spawns this process on the first node:
jhacxc 158833 0.2 0.0 154052 9320 ? S 10:57 0:00 /bin/python3 /var/spool/slurmd.spool/job00038/slurm_script
When issuing scancel on this job, Slurm will send a SIGTERM to that particular process, which only kills Python. The Abaqus process will not die, and no cleanup afterwards will take place.
Suggestion: Make sure the Python scripts generated by Jobbers listen to signals and tries to run "abaqus terminate", something like this:
https://www.sharcnet.ca/Software/Abaqus/6.14.2/v6.14/books/usb/default.htm?startat=pt01ch03s02abx39.html
If that does not succeed within 5 minutes, run Popen.send_signal() or Popen.terminate() to kill the process. There may be leftovers that has been started by mpid, in that case better traverse the entire process tree or else processes started by mpid may be left orphaned.
https://docs.python.org/3/library/subprocess.html
Output from pstree -u on master node: