Closed samtrahan closed 5 years ago
I tested with a job that kill -9 itself. The squeue exit_status does handle signals. The exit_status is the status as per waitpid. Correctly interpreting it requires the WIFSIGNALLED, etc. macros, which Rocoto does no have. However, the exit_status will be 0 if the job exited normally with status 0, and will be non-zero otherwise. If there was a way to make a Process::Status object, that could interpret things for us, but I see no way to create one of those. In other words, it works, but is not elegant.
This was addressed in PR #55
When there have been many jobs over the past 24 hours, scontrol takes an unmanageable amount of time to run. The fix is to switch to squeue, and is in the feature/squeue branch. This fix is on top of feature/more-slurm-states because that change was needed for successful testing.