garrigue / lablgtk

LablGTK 2 and 3: an interface to the GIMP Tool Kit
https://garrigue.github.io/lablgtk
Other
90 stars 40 forks source link

lablgtk fails on multicore due to use of naked pointers #133

Open talex5 opened 3 years ago

talex5 commented 3 years ago

Here is a simplified version of lablgtk's gpointer.ml file, showing the problem:

module Gpointer = struct
  let raw_null = snd (Obj.magic Nativeint.zero)
end

let () =
  Gc.full_major ()
$ ocaml test.ml
fish: “ocaml test.ml” terminated by signal SIGSEGV (Address boundary error)

I think this is the cause of https://github.com/ocaml-multicore/ocaml-multicore/issues/609. There is some more information about naked pointers at https://discuss.ocaml.org/t/ann-a-dynamic-checker-for-detecting-naked-pointers/5805.

garrigue commented 3 years ago

This is a known problem in lablgtk, and I plan to address it. The null pointer case is just an instance, and it is relatively easy to solve. The main problem is with the generated translation tables, which are static C data, and where it is a bit difficult to add headers. Of course, I would welcome a compact patch :-)

ejgallego commented 2 years ago

@kit-ty-kate raised this issue again, so I proposed we coordinate a possible effort here.

garrigue commented 2 years ago

OK, I need to do something about that. IIRC, most pointers are already properly wrapped, but translation tables generated by varcc do not contain the required headers. This is a bit painful to do, as the header size is different from the other contents size...

garrigue commented 2 years ago

See #144 and #145 for fixes for lablgtk3 and lablgtk2 respectively.

garrigue commented 2 years ago

We tried testing #145 (i.e. the lablgtk2 version) on multicore/5.00, but this doesn't seem to work, and we don't know why. If somebody can have a look at it this would be nice.

talex5 commented 2 years ago

I tried the lablgtk3 version on 4.12+domains. I used this code:

let () = print_endline @@ GMain.init ()

But it fails for me:

$ opam pin add lablgtk3 "git+https://github.com/garrigue/lablgtk.git#0ae631f3a0dd153c2d8e05e9ee3cc906c8503bb1"
$ ocamlfind ocamlopt -thread -package lablgtk3 -linkpkg -o test.exe test.ml
$ ./test.exe 
Fatal error: exception Failure("Obj.truncate not supported")

I also tried building it with dune, with the same result.

garrigue commented 2 years ago

Can you try with the lablgtk2 version. The call to Obj.truncate is removed there. It is easier to test for us.

garrigue commented 2 years ago

I have cherry-picked the changes to the lablgtk3 version in #144 . Please test, I would like to release.

talex5 commented 2 years ago

Thanks - some of the examples now work for me. e.g. dune exec -- ./examples/entry.exe works. But others don't, e.g.

$ dune exec -- ./examples/hello.exe
fish: “dune exec -- ./examples/hello.e…” terminated by signal SIGSEGV (Address boundary error)
(rr) bt
#0  caml_darken (v=0, ignored=0x0, state=0x0) at major_gc.c:761
#1  0x000055eb1018cc60 in caml_darken (state=state@entry=0x0, v=<optimized out>, ignored=ignored@entry=0x0) at major_gc.c:759
#2  0x000055eb1018fed0 in write_barrier (new_val=94468103054592, old_val=<optimized out>, field=0, obj=obj@entry=140443383598920)
    at memory.c:140
#3  caml_initialize (fp=fp@entry=0x7fbb85fd8f48, val=val@entry=94468103054592) at memory.c:212
#4  0x000055eb10174fe9 in Val_GObject_new (p=0x55eb11b9a500) at ml_gobject.c:62
#5  0x000055eb101ae26f in <signal handler called> ()
#6  0x000055eb100c654b in camlGobject__unsafe_create_362 () at src/gobject.ml:208
garrigue commented 2 years ago

Thanks for the feedback. Then I think I will merge the PR. Even if it doesn't work on multicore properly, it becomes possible to debug.