Closed dalg24 closed 5 months ago
For reference we use 6hrs on Kokkos, 3hrs on ArborX and Cabana
Cool. I've never seen a celeritas job take more than half an hour once it's started.
@dalg24 Can you explain why this CI job "took 2 hours" (and died) after 3 minutes into the build? https://cloud.cees.ornl.gov/jenkins-ci/job/celeritas/job/PR-1189/2/pipeline-console/?selected-node=14
Does 2 hours include the time the job spends waiting for Kokkos to do its multi-hour builds? 😅
The timeout includes the waiting in the queue. I am not aware of a way to configure it to be actual run time. In any case we do want some upper limit for the whole process. Feel free to increase it again to match what other projects do.
Arg. That means our job successes are directly linked to Kokkos' run times. Is there a way to resubmit the jobs that failed because they got stuck behind one or more Kokkos CI sets? (Besides pull requests, the develop
branch will also experience failures.)
I have no experience with it but you can look at https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#retry-retry-the-body-up-to-n-times but obviously you'd need to come up with a condition that can clearly identify that the failure was a timeout.
Set a period timeout after which the Jenkins server will abort the pipeline run. Without it, a jobs that somehow gets stuck may run for days before it is manually killed by an admin. Feel free to adjust the time value.