Open HertzDevil opened 9 months ago
Also this would work with 128-bit integers:
(to be fair most specs in that file should be moved to spec/primitives/*
)
And also OrcV2 LLJIT (#14856) doesn't have an LLVM::GenericValue
equivalent, so it too requires strongly typed codegen specs.
OrcV2 supposedly supports linking to symbols in the current process (apparently this is necessary for even malloc
), but specs using the prelude currently fail because __emutls_get_address
is undefined. Apparently the fix is to enable emulated TLS (-femulated-tls
) while building the spec binary, or disable emulated TLS while building the JIT'ed target machine. Neither is supported by the C API right now.
EDIT: Trying to forcibly disable emulated TLS:
void LLVMExtDisableEmulatedTLS(LLVMOrcJITTargetMachineBuilderRef Builder) {
auto *JTMB = reinterpret_cast<orc::JITTargetMachineBuilder *>(Builder);
JTMB->getOptions().EmulatedTLS = false;
}
lib LibLLVM
alias OrcJITTargetMachineBuilderRef = Void*
fun orc_jit_target_machine_builder_detect_host = LLVMOrcJITTargetMachineBuilderDetectHost(result : OrcJITTargetMachineBuilderRef*) : ErrorRef
fun orc_lljit_builder_set_jit_target_machine_builder = LLVMOrcLLJITBuilderSetJITTargetMachineBuilder(builder : OrcLLJITBuilderRef, jtmb : OrcJITTargetMachineBuilderRef)
end
lib LibLLVMExt
fun disable_emulated_tls = LLVMExtDisableEmulatedTLS(LibLLVM::OrcJITTargetMachineBuilderRef)
end
lljit_builder = LLVM::Orc::LLJITBuilder.new
LLVM.assert LibLLVM.orc_jit_target_machine_builder_detect_host(out jtmb)
LibLLVMExt.disable_emulated_tls(jtmb)
LibLLVM.orc_lljit_builder_set_jit_target_machine_builder(lljit_builder, jtmb)
lljit = LLVM::Orc::LLJIT.new(lljit_builder)
# ...
now gives me this cryptic error:
dyld[23438]: _tlv_bootstrap called
Program received and didn't handle signal ABRT (6)
On the other hand, enabling emulated TLS:
void LLVMExtEnableEmulatedTLS(LLVMTargetMachineRef M) {
reinterpret_cast<TargetMachine *>(M)->Options.EmulatedTLS = true;
}
class Crystal::Codegen::Target
def to_target_machine(...)
# ...
target = LLVM::Target.from_triple(self.to_s)
machine = target.create_target_machine(...).not_nil!
machine.enable_global_isel = false
LibLLVMExt.enable_emulated_tls(machine)
machine
end
end
breaks @[ThreadLocal]
:
Undefined symbols for architecture arm64:
"_Crystal::System::Thread::current_thread", referenced from:
_*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
_*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
_*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
_*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Crystal uses two strategies to run codegen specs. If the snippet requires the prelude, then
::run
injects aprint
call, builds the code to an actual temporary executable, then inspects its output viaProcess.run
. Otherwise, Crystal uses LLVM's JIT compiler to compile and call an extra wrapper function that, more or less, forwards the result of__crystal_main
with an emptyargc
andargv
:https://github.com/crystal-lang/crystal/blob/0b251d4859ef07534d4f1c4df08cdac2e990791f/src/compiler/crystal/codegen/codegen.cr#L45-L53
Here we focus on this case where
::run
would return anLLVM::GenericValue
. We want to extract a typed value that our specs can actually use, but it turns out only primitive integers, floats, and pointers can be returned:https://github.com/crystal-lang/crystal/blob/0b251d4859ef07534d4f1c4df08cdac2e990791f/src/llvm/lib_llvm/execution_engine.cr#L22-L24
This makes returning multiple values, such as in https://github.com/crystal-lang/crystal/pull/14087#discussion_r1425918400, rather inconvenient; structs and tuples cannot be returned by value, and must go through the heap. (Heap contents are preserved across the JIT function call, stack contents are not, so
pointerof
on a local variable inside the snippet will fail.)Here is a way around that. First, the wrapper function will accept an extra output parameter, rather than returning a value:
We also reserve space for the return type
T
we are interested in. After that, we obtain__evaluate_wrapper
's address, cast it to an appropriateProc
now that we have access toT
, and bypassLLVM::GenericValue
entirely:With the appropriate forwarding for
type
, we should be able to write specs like this:Note that there is no
to_i
after the firstrun
. The secondrun
assumes{Int32, Int32}
is binary-compatible between the spec runner itself and the compiled snippet, but this should hold true for all primitive values, regardless of the current compiler version. (Actually, we are already assuming the same forString
every time a prelude-less codegen spec returns one.) Apart from grouping related specs in onerun
, we could also avoid the error-prone use of&+
in scenarios such as this:https://github.com/crystal-lang/crystal/blob/0b251d4859ef07534d4f1c4df08cdac2e990791f/spec/compiler/codegen/block_spec.cr#L1435-L1446