crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.2k stars 1.61k forks source link

Stop & start the world (undocumented API) #14729

Open ysbaddaden opened 1 week ago

ysbaddaden commented 1 week ago

Add GC.stop_world and GC.start_world methods to be able to stop and restart the world at will from within Crystal.

My use case is a perf-tools feature for RFC 2 that must stop the world to print out runtime information of each ExecutionContext with their schedulers and fibers. See https://github.com/crystal-lang/perf-tools/pull/18

Notes:

  1. I tested the behavior using this simple program;
  2. Darwin has the thread_suspend, thread_resume and thread_get_state syscalls that could be used instead of using signals;
  3. I'm having a hard time to articulate the relationship between GC and Thread on this feature. Thread#suspend feels pretty neat but we need a couple signals on UNIX. For now I expose GC.sig_suspend and GC.sig_resume but they feel out of place :disappointed: The entrypoints are now Thread.start_world and Thread.stop_world and sig suspend/resume are only defined on Crystal::System::Thread for UNIX.
ysbaddaden commented 1 week ago

Maybe it should be Thread.stop_world and Thread.start_world, and they'd call into GC (like for creating a thread)? But that still doesn't say where sig_suspend and sig_resume should be defined.

beta-ziliani commented 1 week ago

Maybe it should be Thread.stop_world and Thread.start_world, and they'd call into GC (like for creating a thread)?

I think this makes sense, yes.

But that still doesn't say where sig_suspend and sig_resume should be defined.

These are the Bohem specfic functions, right? why would they not be defined there?

ysbaddaden commented 1 week ago

There are weird CI failures on AArch64 with no error (the run just cancelled) but I can't replicate :raised_eyebrow:

And I can't download the crystal binary artifact to try it out. The GNU test finally finished compilation and is now running, but the musl one keeps failing. Maybe the VMs still have some issue.

One VM on the AArch64 CI server was acting up.

ysbaddaden commented 4 days ago

Rebased from master to add #14733 + fixed calls to Crystal::System.panic.