Closed rasolca closed 7 months ago
cscs-ci run
cscs-ci run
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
ab2bb6f
) 94.02% compared to head (dbafede
) 94.02%. Report is 1 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
cscs-ci run
cscs-ci run
cscs-ci run
I don't know if it helps, but it seems to also have happened on a non-codecov configuration: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/5676897631.
As the timeout is showing up more often, I would try to upstream this changes.
@rasolca no objection to this. Do I understand correctly that setting
SLURM_WAIT=0
just means wait forever for the job to finish (https://slurm.schedmd.com/srun.html#OPT_wait)? And then we rely on the gitlab job timeout to kill the job instead if it hangs?
Not fully correct.
When one of the processes terminates, slurm expect all other processes to terminate within the wait time, otherwise it kills them. SLURM_WAIT=0
just disable this behaviour. If the job reaches the time limit it is still killed.
cscs-ci run
Just trying to debug problems like: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/5645709351