Closed SophieS9 closed 2 years ago
Identical issue on run 220207_A00748_0220_BHMWC3DRXY with Sample 22M01708.
The job initially ran on node 5 and crashed after 10 minutes. Was then resubmitted on node 1, but failed as analysis directory already existed. Both jobs had the same job ID - 176967.
sacct -j showed an identical ExitCode (13) to the run above.
Node cs05 fixed. Issue resolved!
Run 220202_A00748_0218_BHMWC2DRXY Sample 22M01519
Script 2 for this sample submitted as normal on node 5 and then stopped 2.5 hours later as can be seen from the err and out file timestamps. The err file is empty and the out file has no clear error message, but the app did not run to completion. Stopped during or after TrimFastq step (this step does appear to have finished):
The job then appears to have resubmitted itself with the same job ID on node 10. This crashes immediately as the analysis directory for this sample already exists:
Checking the job accounting information on slurm shows the job failed, and then resubmitted and completed:
ExitCode 13 = Broken pipe: write to pipe with no readers.