Open wihobbs opened 1 month ago
I see these fail occasionally in github CI as well. My guess is each has some kind of race condition.
The first issue reported here should be fixed by #6187.
The second issue seems strange now that I look at it:
0.312s: job.exception type=exec severity=0 task 0 (host tioga16): start failed: sleep: No such file or directory
flux-job: task(s) exited with exit code 127
The test is not finding sleep
in PATH
? I'd assume we're just missing /usr/bin
in PATH
, but the report is that the test only sometimes fails, so there must be something else going on. Also, we use sleep
jobs in a lot of places, so I'd assume we'd see failures elsewhere as well if missing sleep
was really the cause.
Edit: Oh, I see we are hitting the timeout, then we get the start failed
error a few milliseconds later. I wonder if sending the SIGALRM at an inopportune time could cause this error instead of something more sensible?
Sort of a head scratcher, for a few days now, t2406 job-exec: kill-timeout > original value has been failing:
And, a separate issue, t2900 fails inconsistently too: