Closed elliottslaughter closed 4 months ago
Mike pointed out that all the failing jobs are on runner nv-legion-ci-03-2
so this is probably a runner-specific issue and will hopefully go away with a reboot.
@elliottslaughter The bad runner has been rebooted. Can you confirm if you're still seeing issues?
All my reruns look good so far.
We're seeing nondeterministic freezes in CUDA CI jobs, e.g.:
The freezing program is
hello_world
, which seems pretty basic. It seems to happen in just about any configuration, e.g., no network is required.