Open maleadt opened 3 hours ago
I am slightly confused; the code block you show is for CPU tests, and the title is for CPU benchmarks.
The CPU test should take around 4 minutes.
The CPU benchmark should take around 6 minutes each (4 jobs running in parallel).
I can delete both if that is what you meant (to use Buildkite only for GPUs), just trying to understand which one was generating problems.
Sorry for the inconvenience.
I am slightly confused; the code block you show is for CPU tests, and the title is for CPU benchmarks.
Right, so why does it run on GPU workers?
agents:
queue: "juliagpu"
It's probably better to use juliaecosystem
for this.
The CPU test should take around 4 minutes.
It doesn't, though: https://buildkite.com/julialang/komamri-dot-jl/builds/1220#0192be92-3897-497e-8654-6ffcdf6f7cc1 This ran for 40 minutes doing CPU benchmarks on a GPU worker before I canceled it due to some system maintenance.
Here's another instance: https://buildkite.com/julialang/komamri-dot-jl/builds/1220#0192be92-3417-48e3-8401-b2853349fa0f. It seems that they get stuck at some point?
I have just seen this, sorry for this last commits I made 😓
It's probably better to use juliaecosystem for this.
Oh, my bad; we can move the CPU stuff to queue: "juliaecosystem"
if that is fine and set timeout_in_minutes: 10
for all jobs. Does that sound reasonable?
Here's another instance: buildkite.com/julialang/komamri-dot-jl/builds/1220#0192be92-3417-48e3-8401-b2853349fa0f. It seems that they get stuck at some point?
This is surprising, it seems to get stuck during package precompilation.
Please don't use the GPU workers for these long-running jobs:
https://github.com/JuliaHealth/KomaMRI.jl/blob/00a8d8a531b96a95447faaa883d88617736a7da9/.buildkite/runtests.yml#L4-L33