cisco / ChezScheme

Chez Scheme
Apache License 2.0
6.97k stars 984 forks source link

constrain signal delivery to Scheme to the main thread #813

Closed mflatt closed 7 months ago

mflatt commented 7 months ago

This change is an attempt to address #809.

My guess here is that the test crashes when signal 14 is delivered to a thread that is not a Scheme thread. In particular, if a GC has happened previously while multiple threads were running, then GC worker threads may be waiting around. They haven't masked signals, and the thread-local variable for a thread context will be NULL in those worker threads.

In that scenario, I don't think the signal would get delivered to a non-main thread on macOS, because sigprocmask and the automatic masking of signals during delivery is process-wide there. On Linux, however, the signal mask is thread-specific. That difference would explain why I see the crash rarely and (in retrospect, as far as I can remember) only when trying out different Linux systems and not when working in my main macOS development environment.

Before this change, I can force a crash by ensuring that GC threads have been created, disabling signals in the main thread in Linux, and then having a handled signal delivered to the Scheme process. So, maybe the test was sometimes crashing when the main thread happens to have signals blocked in the main thread when signal 14 is delivered to the process. That particular problem could not have happened before parallelism was added to the GC, and so maybe it would never have happened in the test before. In practice, there are often extra threads running in a process, and that's why this patch adjusts the signal handler to redirect to the main thread instead of setting the signal mask in GC worker threads.