Octachron / olivine

Ocaml binding generator for vulkan
39 stars 2 forks source link

Segfault in keep_alive for Pipeline depth stencil create info #10

Open hja3 opened 1 week ago

hja3 commented 1 week ago

Please see the supporting info from the coredump below.

The function camlVkTypesPipeline_depth_stencil_state_create_info.fun_1421 is the keep_alive code generated in aster/structured.ml:403, called from the construct function on lines 471-472. It maps Obj.refr over the fields of the struct places them into an array. I don't understand how Obj works exactly and am only prodding at it from the outside, but what is happening, I think, is that a number of the fields of Pipeline_depth_stencil_state_create_info are immediate values, not pointers, and when caml_make_array tries to dereference those values, it crashes. The crash may be reproduced easily with a subset of the depth stencil data

let a = [|Obj.repr 0.0; Obj.repr false|];;

which segfaults at the same instruction.

Coredump info for the application:

(gdb) where
#0  0x00000000006cbe33 in caml_make_array ()
#1  <signal handler called>
#2  0x0000000000583820 in camlVk__Types__Pipeline_depth_stencil_state_create_info.fun_1421 ()
#3  <signal handler called>
#4  0x00000000006d0cb8 in caml_callback_exn ()
#5  0x00000000006d8a20 in caml_final_do_calls_exn ()
#6  0x00000000006ed9d7 in caml_do_pending_actions_exn ()
#7  0x00000000006edaaf in caml_process_pending_actions_with_root_exn ()
#8  0x00000000006edad9 in caml_process_pending_actions_with_root ()
#9  <signal handler called>
#10 0x00000000006a49bd in camlStdlib__Format.advance_left_670 ()
#11 0x00000000006a53ad in camlStdlib__Format.pp_flush_queue_770 ()
#12 0x00000000006a5675 in camlStdlib__Format.pp_print_newline_929 ()
#13 0x00000000005294a6 in camlLve__First_app.run_2598 () at lib/first_app.ml:213
#14 0x0000000000694e9b in camlStdlib__Fun.protect_326 ()
#15 0x000000000051d1db in camlDune__exe__Main.entry () at bin/main.ml:6
#16 0x00000000005118e7 in caml_program ()
#17 <signal handler called>
#18 0x00000000006f443d in caml_startup_common ()
#19 0x00000000006f4489 in caml_startup ()
#20 0x000000000050f92c in main ()

(gdb) info registers
rax            0x20                32
rbx            0x7f091e14eed0      139677136121552
rcx            0x0                 0
rdx            0x1                 1
rsi            0x0                 0
rdi            0x0                 0
rbp            0x50                0x50
rsp            0x7ffe7e614450      0x7ffe7e614450
r8             0x0                 0
r9             0x0                 0
r10            0x27c72600          667362816
r11            0x7ffe7e614530      140731018724656
r12            0x27c61b70          667294576
r13            0x7ffe7e614660      140731018724960
r14            0x27c61b70          667294576
r15            0x7f091e14eec8      139677136121544
rip            0x6cbe33            0x6cbe33 <caml_make_array+355>
eflags         0x10206             [ PF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

(gdb) disassemble
   0x00000000006cbe28 <+344>:   mov    0x8(%rsp),%rdx
   0x00000000006cbe2d <+349>:   add    %rax,%rdx
   0x00000000006cbe30 <+352>:   mov    (%rdx),%rdx
=> 0x00000000006cbe33 <+355>:   movsd  (%rdx),%xmm0
   0x00000000006cbe37 <+359>:   mov    0x18(%rsp),%rdx
   0x00000000006cbe3c <+364>:   movsd  %xmm0,(%rdx,%rax,1)
   0x00000000006cbe41 <+369>:   add    $0x8,%rax
   0x00000000006cbe45 <+373>:   cmp    %rax,%rbp
   0x00000000006cbe48 <+376>:   jne    0x6cbe28 <caml_make_array+344>
Octachron commented 1 week ago

This looks like the flat float array optimization going awry: arrays that contains floats are optimized to contain unboxed float using a runtime check. Here, your code is constructing an array whose first value is a float, and it may trigger this optimisation which is wrong because the rest of the array doesn't contain floats.

If you have the time, you could test this hypothesis by testing your code with an option switch with the ocaml-option-no-flat-floarray configuration package enabled.

hja3 commented 6 days ago

Ah yes I see, it is attempting to load a float at the time of the crash. That explanation makes sense. Thank you for the suggestion, I will certainly test it, hopefully this evening, and report back the result.

hja3 commented 5 days ago

Yes, the float array optimisation appears to have been the problem. With the optimisation disabled I ran the application for 10 minutes before killing it manually. Beforehand, it could survive not even 100 frames!

Please note, a small patch to the olivine opam package was necessary, included below. The additional build step was required to make the olivine library visible to ocamlfind, and the dependencies have been updated to reflect the move to ppxlib.

diff --git a/olivine.opam b/olivine.opam
index 5432d04..fca4f14 100644
--- a/olivine.opam
+++ b/olivine.opam
@@ -10,13 +10,13 @@ bug-reports: "https://github.com/Octachron/olivine/issues"
 build:[
  ["./configure.sh"]
  [make "vk"]
+ ["dune" "build" "-p" name "-j" jobs "@install"]
 ]

 depends: [
   "dune" {build}
-  "ppx_tools_versioned"
-  "ocaml-migrate-parsetree"
+  "ppxlib"
   "xmlm"
   "fmt"
   "menhir" {build}