Open ysbaddaden opened 6 days ago
This looks great. But it also seems to be a mix of different changes. Could we extract the independent refactorings (such as extracting sequential_codegen
and fork_codegen
, memoization of some methods) to their own PRs?
Implements parallel codegen of object files when MT is enabled in the compiler (
-Dpreview_mt
).It only impacts codegen for compilations with more than one compilation unit (module), that is when neither of
--single-module
,--release
or--cross-compile
is specified. This behavior is identical to the fork based codegen.Advantages:
The main points are increased portability and simpler logic, despite having to take care of LLVM thread safety quirks (see comments).
Issues:
The
threads
arg actually depicts the number of fibers, not threads, which is confusing and problematic: increasingthreads
but notCRYSTAL_WORKERS
will lead to more fibers than threads, with fibers being sheduled on the same threads, which won't bring any improvement.In fact
CRYSTAL_WORKERS
defaults to 4, whenthreads
defaulted to 8. With this patch it defaults toCRYSTAL_WORKERS
, so MT can end up being slower if we don't specifyCRYSTAL_WORKERS=8
.This is still not as efficient as it could be. The main fiber (that feeds the worker fibers) can get blocked by a worker fiber doing codegen, leading the other workers to starve. This is easily noticeable when compiling with
-O1
for example.Both issues will be fixable with RFC 2 where we can start an explicit context to run the worker fibers or start N isolated contexts (maybe a better idea). Until then, one should increase
CRYSTAL_WORKERS
.Supersedes #14227 and doesn't segfault (so far) with LLVM 18.1 :crossed_fingers:
TODO:
mt_parallel(units, n_threads)
CRYSTAL_CONFIG_WORKERS
to configure the default number of workers at compile time instead of the hardcoded 4 (in a distinct PR)