Compiler: enable parallel codegen with MT

Implements parallel codegen of object files when MT is enabled in the compiler (-Dpreview_mt).

It only impacts codegen for compilations with more than one compilation unit (module), that is when neither of --single-module, --release or --cross-compile is specified. This behavior is identical to the fork based codegen.

Advantages:

allows parallel codegen on Windows (untested);
no need to fork many processes;
no repeated GC collections in each forked processes;
a simple Channel to distribute work efficiently (no need for IPC).

The main points are increased portability and simpler logic, despite having to take care of LLVM thread safety quirks (see comments).

Issues:

The threads arg actually depicts the number of fibers, not threads, which is confusing and problematic: increasing threads but not CRYSTAL_WORKERS will lead to more fibers than threads, with fibers being sheduled on the same threads, which won't bring any improvement.

In fact CRYSTAL_WORKERS defaults to 4, when threads defaulted to 8. With this patch it defaults to CRYSTAL_WORKERS, so MT can end up being slower if we don't specify CRYSTAL_WORKERS=8.
This is still not as efficient as it could be. The main fiber (that feeds the worker fibers) can get blocked by a worker fiber doing codegen, leading the other workers to starve. This is easily noticeable when compiling with -O1 for example.

Both issues will be fixable with RFC 2 where we can start an explicit context to run the worker fibers or start N isolated contexts (maybe a better idea). Until then, one should increase CRYSTAL_WORKERS.

Supersedes #14227 and doesn't segfault (so far) with LLVM 18.1 :crossed_fingers:

TODO:

[ ] wait for #14760
[ ] cleanup
[ ] rename the method as mt_parallel(units, n_threads)
[ ] consider increasing the channel size (until we can use ExecutionContext)
[ ] consider a CRYSTAL_CONFIG_WORKERS to configure the default number of workers at compile time instead of the hardcoded 4 (in a distinct PR)

crystal-lang / crystal

Compiler: enable parallel codegen with MT #14748