alan-j-hu / llvm-dune

The official LLVM OCaml binding but built using dune
Other
26 stars 3 forks source link

OCaml bindings assume LLVM objects are at least 2-byte aligned #16

Closed tanapthetimid closed 1 year ago

tanapthetimid commented 1 year ago

I am not quite sure where to report this, as this seems to be an issue in the backported llvm 15 with ocaml 5.0.0 support. I am getting assertion fail in to_val, OCaml bindings assume LLVM objects are at least 2-byte aligned using the latest llvm-15 ocaml binding. There seems to be some discussion here related to this.

tanapthetimid commented 1 year ago

Sorry, this was my mistake integrating with another component.

tanapthetimid commented 1 year ago

I have resolved all the issues with my other component and this bug still exists. I was able to fix this issue by using an wrapper Ocaml block with Custom_tag and custom comparison function.

The patch is as follows

diff --git a/llvm/bindings/ocaml/llvm/llvm_ocaml.c b/llvm/bindings/ocaml/llvm/llvm_ocaml.c
index 5ab1cde0dd..1de9f11027 100644
--- a/llvm/bindings/ocaml/llvm/llvm_ocaml.c
+++ b/llvm/bindings/ocaml/llvm/llvm_ocaml.c
@@ -26,6 +26,31 @@
 #include "caml/callback.h"
 #include "llvm_ocaml.h"

+#define Llvm_val(v) (*((void **) Data_custom_val(v)))
+
+int mycompare(value v1, value v2) {
+  uintptr_t x1 = (uintptr_t) Llvm_val(v1);
+  uintptr_t x2 = (uintptr_t) Llvm_val(v2);
+  if (x1 < x2) {
+    return -1;
+  } else if (x1 > x2) {
+    return 1;
+  } else {
+    return 0;
+  }
+}
+
+static struct custom_operations llvm_value_ops = {
+  "custom.custom.caml.llvm_value",
+  custom_finalize_default,
+  mycompare,
+  custom_hash_default,
+  custom_serialize_default,
+  custom_deserialize_default,
+  custom_compare_ext_default,
+  custom_fixed_length_default
+};
+
 #if OCAML_VERSION < 41200
 value caml_alloc_some(value v) {
   CAMLparam1(v);
@@ -44,16 +69,13 @@ value caml_alloc_tuple_uninit(mlsize_t wosize) {
 }

 value to_val(void *ptr) {
-  assert((((value)ptr) & 1) == 0 &&
-         "OCaml bindings assume LLVM objects are at least 2-byte aligned");
-  return ((value)ptr) | 1;
+  value v = caml_alloc_custom(&llvm_value_ops, sizeof(void *), 0, 1);
+  Llvm_val(v) = ptr;
+  return v;
 }

 void *from_val(value v) {
-  assert(Is_long(v) && "OCaml values representing LLVM objects should have the "
-                       "low bit set so that the OCaml GC "
-                       "treats them as tagged integers");
-  return (void *)(v ^ 1);
+  return Llvm_val(v);
 }

 value llvm_string_of_message(char *Message) {

@alan-j-hu are you sure the 2 byte alignment guarantee is certain?

alan-j-hu commented 1 year ago

Thank you for reporting this. What computer architecture are you running this on, and do you know which LLVM type specifically triggered this assertion failure?

tanapthetimid commented 1 year ago

I am on x86_64. I don't know how to print out the LLVM type, but from gdb backtrace is looks like it might be the function llvm_instr_parent so I think the culprit is the value LLVMBasicBlockRef BB = LLVMGetInstructionParent(Value_val(Inst));.

alan-j-hu commented 1 year ago

I also use x86_64. Do you happen to have a minimal reproducible example for me to test?

alan-j-hu commented 1 year ago

I made a toy program using Llvm.instr_parent and was not able to reproduce this. This is the definition of BasicBlock in LLVM 15: https://github.com/llvm/llvm-project/blob/8dfdcc7b7bf66834a761bd8de445840ef68e4d1a/llvm/include/llvm/IR/BasicBlock.h#L55 It contains a pointer as a member, so it should be at least two-bit aligned. Do you have a reproducible example?

tanapthetimid commented 1 year ago

Could you please try if you could reproduce with this?

let main () =
  let x = Llvm.of_file "small_min_cut.ll" in
  let modi_phi llb phi =
    Llvm.incoming phi |> List.iter (fun (a,b) -> (
            Llvm.dump_value a;
            print_newline ();
            let y = Llvm.instr_parent a in
            let compar = Llvm.equal_llbasicblock b y in
            (if compar then 1 else 0) |> print_int;
            print_newline ();
            ()
    )) in
  let convert llb =
    let fst_inst = llb |> Llvm.first_instr_of_block in
    if Llvm.is_phi fst_inst then
      let phi_list = Llvm.all_next_phi fst_inst in
      List.iter (modi_phi llb) phi_list in
  Llvm.iter_blocks_all convert x

small_min_cut.ll

alan-j-hu commented 1 year ago

Your code has a bunch of functions that don't belong to the Llvm module: of_file, equal_llbasicblock, first_instr_of_block, is_phi, all_next_phi, and iter_blocks_all. Is there supplemental code?

tanapthetimid commented 1 year ago

My bad. How about this one?

let llctx = Llvm.create_context ()

let of_file filename = Llvm.MemoryBuffer.of_file filename |> Llvm_irreader.parse_ir llctx

let instr_of_pos = function Llvm.Before i -> Some i | Llvm.At_end _ -> None

let first_instr_of_block b = Llvm.instr_begin b |> instr_of_pos |> Option.get

let is_phi instr =
  match Llvm.instr_opcode instr with Llvm.Opcode.PHI -> true | _ -> false

let iter_blocks_all f = f |> Llvm.iter_blocks |> Llvm.iter_functions

let main () =
  let x = of_file "small_min_cut.ll" in
  let modi_phi llb phi =
    Llvm.incoming phi |> List.iter (fun (a,b) -> (
            Llvm.dump_value a;
            print_newline ();
            let y = Llvm.instr_parent a in
            let compar = b = y in
            (if compar then 1 else 0) |> print_int;
            print_newline ();
            ()
    )) in
  let convert llb =
    let fst_inst = llb |> first_instr_of_block in
    if is_phi fst_inst then
    let phi_list = [fst_inst] in
      List.iter (modi_phi llb) phi_list in
  iter_blocks_all convert x

edit: i removed the Filename thing

alan-j-hu commented 1 year ago

Thank you, I got the assertion failure and am looking into it.

tanapthetimid commented 1 year ago

Regarding the alignment assumption, as far as I know, C++ alignment is implementation dependent with the exception of a few constraints, so a compiler could theoretically make every type have alignment of 1 byte (even on x86 since there's no hard requirement for memory access in x86). So I think this assumption breaks portability even if it works well on clang/gcc on x86.

alan-j-hu commented 1 year ago

cc @jberdine

alan-j-hu commented 1 year ago

The following program:

#include <iostream>
#include <llvm/IR/BasicBlock.h>

int main()
{
  std::cout << alignof(llvm::BasicBlock) << std::endl;
  return 0;
}

compiled with c++ test.cpp `llvm-config-15 --cxxflags --ldflags --libs core` outputs 8. This means that the three least significant bits of a pointer to a basic block should all be 0, right?

alan-j-hu commented 1 year ago

I made a debug branch from my opam branch fork of LLVM 15.x, and I changed the instr_parent function to:

value llvm_instr_parent(value Inst) {
  LLVMBasicBlockRef BB = LLVMGetInstructionParent(Value_val(Inst));
  fprintf(stderr, "%p\n", BB);
  return to_val(BB);
}

and installed my fork, and I got 0x51 printed to stderr before the program crashed. Is the BasicBlock alignment not being respected?

alan-j-hu commented 1 year ago

I found the issue. It's your code, not the bindings. First of all, when I run your code, when Llvm.dump_value a; gets reached just before the crash, I get i1 false dumped.

Here is a minimal reproducible example that recreates this crash:

let llctx = Llvm.create_context ()

let () =
  let i1false = Llvm.const_int (Llvm.i1_type llctx) 0 in
  Llvm.dump_value i1false;
  let _ = Llvm.instr_parent i1false in
  ()

i1 false gets dumped, and then the assertion is reached and fails.

Here are the relevant functions:

The LLVM API is full of pitfalls and can lead to crashes if you misuse it. I'm not sure how much the OCaml bindings can or should do to protect the user from the lack of memory safety in the underlying API.

Please let me know if you have any questions or concerns.

tanapthetimid commented 1 year ago

While I agree that it's not OCaml bindings' job to prevent misuse of the LLVM API, there should still at least be consistency between what LLVM directly provides and what the bindings provide. In this example, I do not see anywhere in the LLVM documentation that says that a LLVMValueRef of another type cannot be passed into GetParent. The code would run fine when using the LLVM API directly from c++, and it had been running fine using earlier versions of the bindings before the garbage collection requirements of OCaml 5. If you think passing a constant into Instruction's GetParent is invalid that should be reflected in LLVM or documented somewhere.

The issue of portability hasn't been addressed either. The 2 byte alignment guarantee doesn't exist and is platform/compiler dependent. I understand you're trying to preserve the binding's performance characteristic transitioning into OCaml 5, but what you're doing here is extremely fragile and there's not any benchmark backing it up either.

alan-j-hu commented 1 year ago

The code that caused the assertion failure was never correct. It always incorrectly casted some LLVM values, which were not LLVM instructions, to LLVM instructions (i.e. llvm::Instruction inherits from llvm::Value, so casting from values to instructions is a downcast, which is not type-safe). As a result, the code accessed memory in ways that were undefined behavior and a garbage value was interpreted as a pointer to a basic block. If the code happened to run correctly, it was only by accident. Indeed, the following change results in a segfault (and I'm testing this on LLVM 14 with OCaml 4.14.1, so this has nothing to do with the OCaml 5 changes):

@@ -18,6 +18,7 @@ let main () =
             Llvm.dump_value a;
             print_newline ();
             let y = Llvm.instr_parent a in
+            let _ = Llvm.instr_begin y in
             let compar = b = y in
             (if compar then 1 else 0) |> print_int;
             print_newline ();

The reason is that you are incorrectly downcasting non-instruction values to instructions and then reading garbage data.

tanapthetimid commented 1 year ago

I know the code I provided is wrong. I was trying to make a point about consistency between the API and the bindings. That aside, you still haven't addressed the portability issue, and the fragility of using pointer tagging. But if llvm people just want to ignore that then I guess there's not much more for me to say.

alan-j-hu commented 1 year ago

I want to make clear that the OCaml bindings are a thin wrapper over the C API. The corresponding C function for Llvm.instr_parent is LLVMGetInstructionParent, as can be seen at the definition of the OCaml wrapper function at https://github.com/llvm/llvm-project/blob/810bca56f03ad1408fcca9226fa2e5c502ad9917/llvm/bindings/ocaml/llvm/llvm_ocaml.c#L1869. It is the C function LLVMGetInstructionParent that contains a potentially unsafe downcast from a value to an instruction, as can be seen at its definition at https://github.com/llvm/llvm-project/blob/810bca56f03ad1408fcca9226fa2e5c502ad9917/llvm/lib/IR/Core.cpp#L2773. Therefore, the C API function is not type-safe, in the sense that it's possible to pass an LLVMValueRef and get back garbage data that is not actually an LLVMBasicBlockRef.

The decision to use the 2-bit alignment assumption was discussed at https://reviews.llvm.org/D136400#inline-1315218. I again want to emphasize again that what happened here was not that the 2-bit alignment assumption was broken, but that incorrect use of the underlying C API (which has unsafe downcasts) led to garbage data being treated as a pointer to an llvm::BasicBlock.

tanapthetimid commented 1 year ago

You can emphasize as much as you like that there is no 2-bit alignment assumption broken in this issue, but that is not a validation of your assumption. Your alignment assumption is wrong based on the c++ standards, and your code is fragile and heavily dependent on c compiler implementation.

jberdine commented 1 year ago

Note that LLVM itself internally uses the same alignment-based reasoning to enable compact representations in some cases. See e.g. https://llvm.org/doxygen/PointerSumType_8h_source.html where the least significant 2 bits of pointers are used to encode the tag of a discriminated union. The toolchain dependency relied on by the OCaml bindings is not stronger than what LLVM already assumes. If at some point this assumption is no longer viable, then to_val and from_val can be changed to allocate wrapper objects for pointers that are not aligned. But that will require auditing all their use sites and ensuring that it is safe to access the OCaml runtime system at the point.