crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.33k stars 1.61k forks source link

Strongly typed code generation specs #14090

Open HertzDevil opened 9 months ago

HertzDevil commented 9 months ago

Crystal uses two strategies to run codegen specs. If the snippet requires the prelude, then ::run injects a print call, builds the code to an actual temporary executable, then inspects its output via Process.run. Otherwise, Crystal uses LLVM's JIT compiler to compile and call an extra wrapper function that, more or less, forwards the result of __crystal_main with an empty argc and argv:

https://github.com/crystal-lang/crystal/blob/0b251d4859ef07534d4f1c4df08cdac2e990791f/src/compiler/crystal/codegen/codegen.cr#L45-L53

Here we focus on this case where ::run would return an LLVM::GenericValue. We want to extract a typed value that our specs can actually use, but it turns out only primitive integers, floats, and pointers can be returned:

https://github.com/crystal-lang/crystal/blob/0b251d4859ef07534d4f1c4df08cdac2e990791f/src/llvm/lib_llvm/execution_engine.cr#L22-L24

This makes returning multiple values, such as in https://github.com/crystal-lang/crystal/pull/14087#discussion_r1425918400, rather inconvenient; structs and tuples cannot be returned by value, and must go through the heap. (Heap contents are preserved across the JIT function call, stack contents are not, so pointerof on a local variable inside the snippet will fail.)

Here is a way around that. First, the wrapper function will accept an extra output parameter, rather than returning a value:

# void (*__evaluate_wrapper)(void*)
wrapper_type = LLVM::Type.function([llvm_context.void_pointer], llvm_context.void)
wrapper = llvm_mod.functions.add("__evaluate_wrapper", wrapper_type) do |func|
  func.basic_blocks.append "entry" do |builder|
    argc = llvm_context.int32.const_int(0)
    argv = llvm_context.void_pointer.pointer.null
    ret = builder.call(main.type, main.func, [argc, argv])
    builder.store(ret, func.params[0]) unless node.type.void? || node.type.nil_type?
    builder.ret
  end
end

We also reserve space for the return type T we are interested in. After that, we obtain __evaluate_wrapper's address, cast it to an appropriate Proc now that we have access to T, and bypass LLVM::GenericValue entirely:

lib LibLLVM
  fun get_function_address = LLVMGetFunctionAddress(ee : ExecutionEngineRef, name : Char*) : UInt64
end

class Crystal::Program
  def evaluate2(node, type : T.class, debug = Debug::Default) forall T
    # ...
    ret = uninitialized T
    LLVM::JITCompiler.new(llvm_mod) do |jit|
      func_ptr = LibLLVM.get_function_address(jit, "__evaluate_wrapper")
      func = Proc(T*, Nil).new(Pointer(Void).new(func_ptr), Pointer(Void).null)
      func.call(pointerof(ret))
    end
    ret
  end
end

With the appropriate forwarding for type, we should be able to write specs like this:

run("1", Int32).should eq(1)

run(<<-CRYSTAL, Tuple(Int32, Int32)).should eq({8, 16})
  class Foo
    def initialize(@x : Int32, @y : Int32, @z : Int32)
    end
  end

  {sizeof(Foo), instance_sizeof(Foo)}
  CRYSTAL

Note that there is no to_i after the first run. The second run assumes {Int32, Int32} is binary-compatible between the spec runner itself and the compiled snippet, but this should hold true for all primitive values, regardless of the current compiler version. (Actually, we are already assuming the same for String every time a prelude-less codegen spec returns one.) Apart from grouping related specs in one run, we could also avoid the error-prone use of &+ in scenarios such as this:

https://github.com/crystal-lang/crystal/blob/0b251d4859ef07534d4f1c4df08cdac2e990791f/spec/compiler/codegen/block_spec.cr#L1435-L1446

HertzDevil commented 9 months ago

Also this would work with 128-bit integers:

https://github.com/crystal-lang/crystal/blob/8fe3c70b3dc2078b7a03d3fc7c6736dec4f42405/spec/compiler/codegen/primitives_spec.cr#L17-L25

(to be fair most specs in that file should be moved to spec/primitives/*)

HertzDevil commented 1 month ago

And also OrcV2 LLJIT (#14856) doesn't have an LLVM::GenericValue equivalent, so it too requires strongly typed codegen specs.

HertzDevil commented 1 month ago

OrcV2 supposedly supports linking to symbols in the current process (apparently this is necessary for even malloc), but specs using the prelude currently fail because __emutls_get_address is undefined. Apparently the fix is to enable emulated TLS (-femulated-tls) while building the spec binary, or disable emulated TLS while building the JIT'ed target machine. Neither is supported by the C API right now.

EDIT: Trying to forcibly disable emulated TLS:

void LLVMExtDisableEmulatedTLS(LLVMOrcJITTargetMachineBuilderRef Builder) {
  auto *JTMB = reinterpret_cast<orc::JITTargetMachineBuilder *>(Builder);
  JTMB->getOptions().EmulatedTLS = false;
}
lib LibLLVM
  alias OrcJITTargetMachineBuilderRef = Void*

  fun orc_jit_target_machine_builder_detect_host = LLVMOrcJITTargetMachineBuilderDetectHost(result : OrcJITTargetMachineBuilderRef*) : ErrorRef
  fun orc_lljit_builder_set_jit_target_machine_builder = LLVMOrcLLJITBuilderSetJITTargetMachineBuilder(builder : OrcLLJITBuilderRef, jtmb : OrcJITTargetMachineBuilderRef)
end

lib LibLLVMExt
  fun disable_emulated_tls = LLVMExtDisableEmulatedTLS(LibLLVM::OrcJITTargetMachineBuilderRef)
end

lljit_builder = LLVM::Orc::LLJITBuilder.new

LLVM.assert LibLLVM.orc_jit_target_machine_builder_detect_host(out jtmb)
LibLLVMExt.disable_emulated_tls(jtmb)
LibLLVM.orc_lljit_builder_set_jit_target_machine_builder(lljit_builder, jtmb)

lljit = LLVM::Orc::LLJIT.new(lljit_builder)
# ...

now gives me this cryptic error:

dyld[23438]: _tlv_bootstrap called
Program received and didn't handle signal ABRT (6)

On the other hand, enabling emulated TLS:

void LLVMExtEnableEmulatedTLS(LLVMTargetMachineRef M) {
  reinterpret_cast<TargetMachine *>(M)->Options.EmulatedTLS = true;
}
class Crystal::Codegen::Target
  def to_target_machine(...)
    # ...
    target = LLVM::Target.from_triple(self.to_s)
    machine = target.create_target_machine(...).not_nil!
    machine.enable_global_isel = false
    LibLLVMExt.enable_emulated_tls(machine)
    machine
  end
end

breaks @[ThreadLocal]:

Undefined symbols for architecture arm64:
  "_Crystal::System::Thread::current_thread", referenced from:
      _*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
      _*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
      _*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
      _*Crystal::System::Thread::current_thread:Thread in C-rystal5858S-ystem5858T-hread.o0.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)