alan-j-hu / llvm-dune

The official LLVM OCaml binding but built using dune
Other
26 stars 3 forks source link

Add LLVMContextSetOpaquePointers bindings #9

Closed alan-j-hu closed 1 year ago

alan-j-hu commented 1 year ago

Update the Git submodule to include the LLVMContextSetOpaquePointers binding I added.

kit-ty-kate commented 1 year ago

Looking at https://github.com/ocaml/opam-repository/pull/24162 it looks like this is needed because Llvm.build_call would not work at all with the OCaml binding since you can't build an opaque pointer with the current API (or i missed it?).

If so, why not, instead of creating this new function, simply always call LLVMContextSetOpaquePointers whenever an llcontext is returned?

alan-j-hu commented 1 year ago

Thank you for catching this. I did not realize that I didn't add the bindings to construct opaque pointers. I have added them to my local copy of LLVM 15.

@jberdine wanted my patches to help transition from typed to opaque pointers. His intention was to make sure that code could build with two consecutive release (LLVM 14 and LLVM 15, or LLVM 15 and LLVM 16). In LLVM 15, opaque pointer mode is on by default: https://releases.llvm.org/15.0.0/docs/OpaquePointers.html However, I would assume @jberdine would want the opaque pointer mode to be off if it is required for LLVM 15 to work with LLVM 14 code.

However, I have made the changes on my local copy of LLVM, and now the tests do not pass: Opaque pointer mode being on or off results in different IR being produced, and the tests check against the IR. Is this acceptable? What should the proper course of action be?

kit-ty-kate commented 1 year ago

If you prefer not to set opaque pointers off by default, an alternative paths would be to check at runtime, in every functions that only work with opaque pointer on or off, that it is set correctly, and raise an exception if it is not. That would avoid the hard to debug segfault.

Given that llvm 16 does not support non-opaque ptr types, it seems counter-productive to try to use phantom types to encode this instead, so i think exceptions are the way to go here.

alan-j-hu commented 1 year ago

I did some experimenting. Having turned opaque pointers off by default in the bindings, I then added code to the tests that turned opaque pointers back on. When I ran the tests, I found out there is an assertion error if you try to turn off opaque pointers, then turn them back on:

lvm-project/llvm/lib/IR/LLVMContextImpl.cpp:262: void llvm::LLVMContextImpl::setOpaquePointers(bool): Assertion `(!OpaquePointers || OpaquePointers.value() == OP) && "Cannot change opaque pointers mode once set"' failed.

I looked in the code and OpaquePointers member of the LLVMContextImpl class is an Optional<bool>. The optional starts out as none, and the class will let you change it to some true or some false. However, once the optional holds some bool, it can't be set again. Therefore, automatically turning opaque pointers off in the binding code deprives clients of the library of the choice to turn them on. So, I must not change this option.

However, I also found out that the LLVM binding test code makes use of the deprecated typed-pointer functions, and the code does not crash. I read in the documentation page I linked that "Additionally, opaque pointer mode is automatically disabled for IR and bitcode files that explicitly mention i8* style typed pointers," but I don't completely understand the circumstances in which the typed pointer functions can be used. Evidently, even though Kakadu's code and my test code crashed, the LLVM test suite does not?

alan-j-hu commented 1 year ago

My assumption is that code like pointer_type i8_type is fine (they appear in the test suite code), but code like build_call cause crashes without turning opaque pointers off (the test suite code only uses build_call2 et. al.). Josh told me to add deprecation warnings, but I had messed them up by using @@@ instead of @@, which is why @Kakadu did not receive any warning.

I feel like the deprecation warnings are enough and adding machinery to detect the mode is too much. Should I add clarification to the warning telling people that they need to explicitly turn off opaque pointer mode to use the deprecated functions? In any case, I feel that because of how LLVM made the opaque pointer mode work, trying to have code build with both LLVM 14 and 15 is not possible without disrupting the transition to opaque pointers. Requiring the client take some explicit action to acknowledge the deprecation (either adding code to turn off the opaque pointer mode, or replacing the deprecated functions) might be good.

kit-ty-kate commented 1 year ago

I feel like the deprecation warnings are enough and adding machinery to detect the mode is too much.

I wholeheartedly disagree, dune's release-mode (includes anything in opam) and the default compiler settings do not stop compilation if the deprecated functions are used. Leading users to believe it works but only resulting in an obscure segfault.

Segfaults in OCaml should never happen and the LLVM binding is already riddled with them in normal times, adding more of them instead of trying to remove them is the wrong thing to do IMO.

kit-ty-kate commented 1 year ago

I've tried this branch on https://github.com/kit-ty-kate/labrys/pull/11 but it's still segfaults

alan-j-hu commented 1 year ago

Thank you for testing this patch. I discovered that I did not properly convert the LLVMBuildGEP bindings to OCaml 5. (This was a bug present in the fork, not the upstream code.)

I pushed a fix to my fork, and then updated the Git submodule of this PR branch's. However, I don't see the commit event on this PR page (I see the commit at https://github.com/kit-ty-kate/llvm-dune/tree/add-set-opaque), so I don't know if I did something wrong.

kit-ty-kate commented 1 year ago

I pushed a fix to my fork, and then updated the Git submodule of this PR branch's. However, I don't see the commit event on this PR page (I see the commit at https://github.com/kit-ty-kate/llvm-dune/tree/add-set-opaque), so I don't know if I did something wrong.

don't worry it happens. Sometime the Github UI freezes. It should appear in an hour or so.

kit-ty-kate commented 1 year ago

Thanks a lot for this fix. It now goes further but still segfaults, this time in llvm.linker:

Compiling Basic
Linking Basic
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
==85882== 
==85882== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==85882==  Access not within mapped region at address 0x78
==85882==    at 0x59E1620: llvm::Function::eraseFromParent() (in /usr/lib/libLLVM-15.so)
==85882==    by 0x6FF479F: llvm::IRMover::move(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, llvm::ArrayRef<llvm::GlobalValue*>, llvm::unique_function<void (llvm::GlobalValue&, std::function<void (llvm::GlobalValue&)>)>, bool) (in /usr/lib/libLLVM-15.so)
==85882==    by 0x6FFB503: ??? (in /usr/lib/libLLVM-15.so)
==85882==    by 0x6FFC1BB: llvm::Linker::linkInModule(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, unsigned int, std::function<void (llvm::Module&, llvm::StringSet<llvm::MallocAllocator> const&)>) (in /usr/lib/libLLVM-15.so)
==85882==    by 0x6FFC3F7: llvm::Linker::linkModules(llvm::Module&, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, unsigned int, std::function<void (llvm::Module&, llvm::StringSet<llvm::MallocAllocator> const&)>) (in /usr/lib/libLLVM-15.so)
==85882==    by 0x6FFC613: LLVMLinkModules2 (in /usr/lib/libLLVM-15.so)
==85882==    by 0x6F494B: llvm_link_modules (linker_ocaml.c:30)
==85882==    by 0x72DE6F: caml_c_call (in /tmp/labrys/_build/default/src/main.exe)
==85882==  If you believe this happened as a result of a stack
==85882==  overflow in your program's main thread (unlikely but
==85882==  possible), you can try to increase the size of the
==85882==  main thread stack using the --main-stacksize= flag.
==85882==  The main thread stack size used in this run was 8388608.

this time it actually looks similar to https://github.com/kit-ty-kate/labrys/pull/10

kit-ty-kate commented 1 year ago

this time it actually looks similar to kit-ty-kate/labrys#10

sure enough, if i cherry-pick https://github.com/kit-ty-kate/labrys/pull/10 on top of the change on https://github.com/kit-ty-kate/labrys/pull/11, it works.

I guess this means this PR is in a mergeable state as this has been broken since LLVM 13, but still it would be nice to understand why llvm.linker segfaults out-of-the-box. @alan-j-hu Do you want to release this as-is or look into it first?

@Kakadu could you try installing this branch and see if it works with your package(s) that use LLVM?

./setup.sh llvm-config-15 # depends on the name of llvm-config on your system
dune build --release
./install "$(opam var prefix)"
Kakadu commented 1 year ago

I have troubles with migrating to LLVM 15. From the presented API it looks like LLVM no longer know which type pointer is pointing to. Isn't that suspicious? Also, references to C++ methods without referencing OCaml counterparts looks unfinished.

(** [pointer_type ty] returns the pointer type referencing objects of type
    [ty] in the default address space (0).
    See the method [llvm::PointerType::getUnqual]. *)
val pointer_type : lltype -> lltype
[@@ocaml.deprecated
  "pointer_type is deprecated in LLVM 15, use pointer_type2 that constructs \
   an opaque pointer. In LLVM 16, pointer_type is an alias for pointer_type2."]

(** [pointer_type2 c] returns the opaque pointer type in the default address
    space (0) in context [c]. See the method [llvm::PointerType::getUnqual]. *)
val pointer_type2 : llcontext -> lltype
Kakadu commented 1 year ago

Also, I have rather offtopic question. Could you point any demos where a small language with GC is implemented via LLVM's GC interface? (shadow stack or statepoints, something like these) The only thing I found is https://github.com/xp10rd/coolc/tree/main/coolc in C++

kit-ty-kate commented 1 year ago

I have troubles with migrating to LLVM 15. From the presented API it looks like LLVM no longer know which type pointer is pointing to. Isn't that suspicious?

I also find this really annoying but LLVM devs have their reasons i guess (https://llvm.org/docs/OpaquePointers.html). Anyway, you can use dune build --release or add -w 3 to the list of flags and it'll just compile. Those are just warnings, the functions do work otherwise (provided that you used Llvm.set_opaque_pointers c false on the llcontexts that you have). See https://github.com/kit-ty-kate/labrys/pull/11 as an example.

Also, I have rather offtopic question. Could you point any demos where a small language with GC is implemented via LLVM's GC interface? (shadow stack or statepoints, something like these) The only thing I found is https://github.com/xp10rd/coolc/tree/main/coolc in C++

Harrop's miniML example should be one: https://github.com/kit-ty-kate/harrop-minml (though it being 16 years old code ported to more modern OCaml 6 years ago, it does suffer from being hard to compile with OCaml >= 5.0, but 4.14 should work, modulo Llvm.set_opaque_pointers c false)

Kakadu commented 1 year ago

Also, I have rather offtopic question. Could you point any demos where a small language with GC is implemented via LLVM's GC interface? (shadow stack or statepoints, something like these) The only thing I found is https://github.com/xp10rd/coolc/tree/main/coolc in C++

Harrop's miniML example should be one: https://github.com/kit-ty-kate/harrop-minml (though it being 16 years old code ported to more modern OCaml 6 years ago, it does suffer from being hard to compile with OCaml >= 5.0, but 4.14 should work, modulo Llvm.set_opaque_pointers c false)

I don't see any code relevant to GC there....

Kakadu commented 1 year ago

I'm in process of changing the code, and got runtime errors after successful compilation. It looks like I specfied a function type in build_call2, but it looks like it expected (or printed) only the type of the result. I will continue investigating...

--- a/_build/.sandbox/f18019c7de06c9c8aee006dc128be08e/default/tests/llvm/simple1.t
+++ b/_build/.sandbox/f18019c7de06c9c8aee006dc128be08e/default/tests/llvm/simple1.t.corrected
@@ -14,32 +14,19 @@
vb id
formal parameter 0: i64 %0
vb main
+  Called function is not the same type as the call!
+    %0 = call i64 (i64) @id(i64 4)
+  Function return type does not match operand type of return inst!
+    ret i64 (i64) %0
+   i64LLVM ERROR: Broken function found, compilation aborted!
+  Aborted (core dumped)
+  [134]
$ cat fack.ll | grep 'target datalayout' --invert-match
-  ; ModuleID = 'main'
-  source_filename = "main"
-  
-  declare void @myputc(i64)
-  
-  declare i64 @rukaml_applyN(i64, i64, ...)
-  
-  declare i64 @rukaml_alloc_closure(i64, i64)
-  
-  declare i64 @rukaml_alloc_pair(i64, i64)
-  
-  declare i64 @rukaml_field(i64, i64)
-  
-  define i64 @id(i64 %0) {
-  entry:
-    ret i64 %0
-  }
-  
-  define i64 @main() {
-  entry:
-    %0 = call i64 @id(i64 4)
-    ret i64 %0
-  }
+  cat: fack.ll: No such file or directory
+  [1]
kit-ty-kate commented 1 year ago

I don't see any code relevant to GC there....

oops, sorry i mixed two projects by Jon Harrop. I was thinking of HLVM, which has a shadow-stack implementation: https://github.com/jdh30/hlvm

kit-ty-kate commented 1 year ago

I'm in process of changing the code, and got runtime errors after successful compilation. It looks like I specfied a function type in build_call2, but it looks like it expected (or printed) only the type of the result. I will continue investigating...

did it work without changing the code to use the new build_*2 functions?

Kakadu commented 1 year ago

I'm in process of changing the code, and got runtime errors after successful compilation. It looks like I specfied a function type in build_call2, but it looks like it expected (or printed) only the type of the result. I will continue investigating...

did it work without changing the code to use the new build_*2 functions?

Yes, it worked with LLVM14 (you can see the previous bitcode in the diff)

kit-ty-kate commented 1 year ago

Yes, it worked with LLVM14 (you can see the previous bitcode in the diff)

I meant with LLVM 15

alan-j-hu commented 1 year ago

Do you still want me to put some extra code to detect when the deprecated functions are being used with opaque pointers? I personally feel it will be a complication, and I'm not sure how I would accomplish it yet.

Also, references to C++ methods without referencing OCaml counterparts looks unfinished.

What do you mean? The counterpart is the function that the comment is documenting.

kit-ty-kate commented 1 year ago

Do you still want me to put some extra code to detect when the deprecated functions are being used with opaque pointers? I personally feel it will be a complication, and I'm not sure how I would accomplish it yet.

ok what about a compromise: should we remove set_opaque_pointers but instead changing the type of global_context and create_context by adding a new ~opaque_pointers:bool argument. This way we don't have packages in the wild that compiles with LLVM 15 only to segfault at runtime.

For people who use the LLVM 14 compatible interface, they would have to change their code anyway to add set_opaque_pointers with the current state of this PR. And for people who use the new opaque pointers interface, they would also have to change their code anyway.

As far as I understand LLVM 16 also changes the API heavily so it shouldn't be no big deal to simply have that argument removed in that version.

Kakadu commented 1 year ago

I'm in process of changing the code, and got runtime errors after successful compilation. It looks like I specfied a function type in build_call2, but it looks like it expected (or printed) only the type of the result. I will continue investigating...

did it work without changing the code to use the new build_*2 functions?

Yes, in combination with Llvm.set_opaque_pointers context false;

kit-ty-kate commented 1 year ago

Yes, in combination with Llvm.set_opaque_pointers context false;

awesome!

alan-j-hu commented 1 year ago

Hi,

I just heard back from @jberdine. He told me, "Have I understood correctly that with the extra labeled argument, that users will be able to create a context with opaque_pointers on and then call build_call and crash? That makes me nervous to be honest."

To summarize, this is the situation:

Opaque Pointers Off (a) Opaque Pointers On (b)
build_call Works Crashes
build_call2 Works Works

Josh has suggested releasing two versions of LLVM 15, a and b:

a) "Have the bindings change the mode to opaque pointer mode off, and include bindings for both old and new functions like LLVMBuildCall and LLVMBuildCall2."

b) "Leave the opaque pointer mode on, and remove bindings for functions that crash."

So, instead of changing the OCaml API to have the user specify opaque pointer mode, the user would specify the mode by pulling in the desired version. This is better because adding the opaque_pointers:bool parameter (or even adding a separate set_opaque_pointers function) gets in the way of API compatibility between LLVM 15 and 16. Additionally, rather than migrating from LLVM 14 to LLVM 15 to LLVM 16, the user would migrate from LLVM 14 to LLVM 15a to LLVM 15b.

I am strongly in favor of the LLVM 15a and 15b solution that Josh suggested. Please let me know if I made anything unclear.

alan-j-hu commented 1 year ago

I now have two patched versions of LLVM 15.x, one that turns opaque pointer mode on and removes the deprecated functions and the other which turns opaque pointer mode off.

cc @jberdine

kit-ty-kate commented 1 year ago

Having two different versions seems extremely annoying for users and opam-maintainers alike. Why not just have the typed version (opaque pointer off) + set_opaque_pointer_true if users really want to try the new feature, and leave it to LLVM 16 to actually have opaque pointers on by default and remove/rename those functions?

alan-j-hu commented 1 year ago

Opaque pointer mode is on by default and the OCaml bindings would have to explicitly turn it off by calling LLVMContextSetOpaquePointers. Then, once the opaque pointer mode has been explicitly set (either on or off), it cannot be changed again (the function will crash). So, the user cannot turn opaque pointer mode back on.

I also feel that having the opaque_pointers:bool parameter or set_opaque_pointers function just in LLVM 15 disturbs migration from LLVM 14 and migration to LLVM 15, while just pulling in either LLVM 15+typed or LLVM15+opaque as separate package versions makes the code compatibility much cleaner.

kit-ty-kate commented 1 year ago

Is there really a need for (non-API driven) opaque pointers in this release? I'm not seeing any way of having two different versions in opam that wouldn't be extremely messy for everyone involved.

Why not just release https://github.com/alan-j-hu/llvm-project/tree/release/15.x%2Btyped and leave the opaque pointer change for LLVM16?

alan-j-hu commented 1 year ago

Considering that the "typed" version does still support the new opaque pointer API functions (since opaque pointers purely have less information), I am satisfied with this solution. I'm curious what @jberdine thinks.

Should I try to reconstruct a commit history on the release/15.x-typed branch to separate the various changes (e.g. backporting the naked pointer fix, adding missed functions, adding deprecation alerts, etc)?

kit-ty-kate commented 1 year ago

Should I try to reconstruct a commit history on the release/15.x-typed branch to separate the various changes (e.g. backporting the naked pointer fix, adding missed functions, adding deprecation alerts, etc)?

I'm fine with whatever commit history you think is best. If you have the commits on hand then sure why not, but if it requires more unnecessary work for you then let's not do that and let's leave it as one commit.

alan-j-hu commented 1 year ago

Okay, I was able to create a clean Git history at https://github.com/alan-j-hu/llvm-project/tree/release/15.x%2Bopam, and have changed the Git submodules on this PR's branch to point to it. Opaque pointer mode is off.

kit-ty-kate commented 1 year ago

LGTM. I've just tried it on labrys and it works after applying https://github.com/kit-ty-kate/labrys/pull/11 (expected) and https://github.com/kit-ty-kate/labrys/pull/10 (sadly still segfaults otherwise)

This is good enough for me as it was more or less the state since LLVM 13. Let's have another go at releasing this.

Many thanks for all your work!