Closed d-netto closed 3 weeks ago
This test runs inside rr
, so there might be a trace uploaded somewhere?
CC: @DilumAluthge who might know.
You missed the assertion text in your copy. It is this:
julia: /cache/build/builder-amdci5-0/julialang/julia-master/src/scheduler.c:452: ijl_task_get_next: Assertion `__extension__ ({ __auto_type __atomic_load_ptr = (&ptls->sleep_check_state); __typeof__ (*__atomic_load_ptr) __atomic_load_tmp; __atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (memory_order_relaxed)); __atomic_load_tmp; }) == not_sleeping' failed.
When a signal causes a thread to resume, we need to also force it back into the not_sleeping
state and increment nrunning
. Similar to #54721, but needs to also happen when the signal response is to terminate the process directly (such as in jl_task_frame_noreturn
) and not just when it throws an InterruptException. I am not entirely certain that we can keep the nrunning
counter accurate in this case, but it probably shouldn't matter as we should be attempting to tear down the process fairly aggressively and not wait for nrunning
to go to zero (though someone could trick it by calling wait()
from their atexit hook such that it cannot exit)
Ah, OK. Thanks for the clarification.
Suspect it's fine to close then?
This test runs inside
rr
, so there might be a trace uploaded somewhere?
Yeah, if you follow the link to Buildkite, you can click on the "Artifacts" tab, and then you can download the rr
trace.
It might be split across multiple parts that you need to combine back together.
This error is happening with a high rate lately.
See https://buildkite.com/julialang/julia-master/builds/38431#0190e57f-77e8-461e-afd1-be9abc0297f8:
This happened in https://github.com/JuliaLang/julia/pull/55233, which is basically a NFC and doesn't change anything in the scheduler, so I think it's unlikely to be related to the PR.