crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.36k stars 1.62k forks source link

Fix: setup signal handlers in interpreted code #14766

Open ysbaddaden opened 3 months ago

ysbaddaden commented 3 months ago

The interpreter won't receive signals anymore, since the interpreted code will receive them, but the interpreter's signal loop fiber won't be resumed and it won't handle signals anymore.

Fixes the default handler for SIGCHLD into the interpreted code, so Process.run now works as expected and don't hang forever.

closes #12241 Alternative to / closes #14019

TODO:

I already marked some specs as pending (mostly those about segfaults), but there seems to be another one failing, or maybe it's a more common occurrence.

Maybe the new segfault is related to the support of fork in the interpreter?

ysbaddaden commented 3 months ago

Hypothesis A: the signal handler is running on the current stack, and that's messing with the interpreter. Maybe registering an alternate stack with sigaltstack() and pass SA_ONSTACK to sigaction.

Hypothesis B: the signal handler is replaced by interpreted code, which may do things that aren't thread safe, for example call into GC from the signal handler (not the signal loop). We might want the interpreted code to change the writer fd of the interpreter, so the signal handler runs in the interpreter but will be received by the interpreted code.

ysbaddaden commented 3 months ago

Hypothesis B was the issue.

We needed a couple interpreter intrinsics after all. The interpreter is registering the handlers (can't run interpreted code in C signal handlers), so it receives the signals and forwards them to the interpreted signal pipe, and the signal loop of the interpreted code will invoke the interpreted handlers :+1:

Note: the interpreter won't handle signals anymore, but it already didn't handle them after starting to interpret code, since the interpreter bypasses the fiber scheduler (it directly resumes fibers) so the signal loop fiber was never resumed.

ysbaddaden commented 3 months ago

With this patch, I can run the full minitest.cr test suite using the interpreter :heart:

ysbaddaden commented 3 months ago

Running the std specs with the interpreter was painfully slow, so I compiled crystal in release mode and then tried to run the specs, but it segfaulted after a bunch of forks, and before reporting any progress. I tried running:

$ SPEC_SPLIT="3%4" bin/crystal i spec/std_spec.cr

A gdb session shows that the interpreter eventually segfaults:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555e2181e in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:354
354           {% begin %}
(gdb) bt
#0  0x0000555555e2181e in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:354
#1  0x0000555555c4a753 in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:252
#2  0x00005555561ca274 in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:92
#3  interpret_and_exit_on_error ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:96
#4  0x00005555561c6587 in run_file ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:65
#5  repl () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command/repl.cr:47
#6  0x0000555556126ea4 in run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:104
#7  0x000055555574671c in run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:55
#8  run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:54
#9  __crystal_main () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal.cr:11
#10 0x000055555574cf59 in main_user_code () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:118
#11 main () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:104
#12 main () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:130
ysbaddaden commented 3 months ago

Interestingly the specs that failed when calling out to crystal i can still fail when calling out crystal build on CI, meaning the interpretation ain't the issue, but something else (subprocess std in/out/err communication?) :thinking:

ysbaddaden commented 3 months ago

I think the problem, now, is support for fork.

It's likely unsafe to fork/exec in the interpreted code, and we should fork and exec from the interpreter directly... or maybe we should use posix_spawn instead for interpreted code on UNIX? Or we should have an intrinsic for spawning a process in the interpreter (i.e. call fork/exec out of the interpreted code) :thinking: