UQ-RCC / nimrodg

Nimrod/G
https://rcc.uq.edu.au/nimrod
Apache License 2.0
1 stars 0 forks source link

Attempts aren't failed if the final command fails #16

Closed vs49688 closed 4 years ago

vs49688 commented 5 years ago

If onerror == fail and the last command in a job fails, the job scheduler doesn't count it as a failure.

This is a logic error in DefaultJobScheduler.java:

if(au.getAction() == AgentUpdate.Action.Stop) {
    if(cr.index < maxIdx || cr.status != CommandResult.CommandResultStatus.SUCCESS) {
        /* A command has failed and caused the job to stop. */
        ops.updateJobFinished(att, true);
    } else {
        /* We've finished successfully. */
        ops.updateJobFinished(att, false);
    }
}
vs49688 commented 5 years ago

There's no real way to fix this as the CommandResult structures look the same regardless of the onerror status.

vs49688 commented 5 years ago

Current workaround is to add exec /bin/true as the final command in a planfile