aantron / luv

Cross-platform asynchronous I/O and system calls
https://aantron.github.io/luv
MIT License
275 stars 26 forks source link

Memory corruption in `luv_exit_trampoline` #145

Closed talex5 closed 1 year ago

talex5 commented 1 year ago

luv_exit_trampoline calls caml_copy_int64 (which can trigger a GC) while holding an unregistered pointer to an OCaml value in callback. To make it easier to see the bug, you can just do some extra allocation there:

static void luv_exit_trampoline(
    uv_process_t *c_handle, int64_t exit_status, int term_signal)
{
    caml_acquire_runtime_system();
    GET_HANDLE_CALLBACK(LUV_GENERIC_CALLBACK);

    for (int i = 1; i < 10000; i++ ) caml_copy_int64(1);    // Trigger bug faster

    caml_callback2(
        callback, caml_copy_int64(exit_status), Val_int(term_signal));
    caml_release_runtime_system();
}

Originally reported by @smondet against eio_luv in https://github.com/ocaml-multicore/eio/issues/446.

aantron commented 1 year ago

Thank you! This is fixed in https://github.com/aantron/luv/commit/fc56347c4406c29b638c9eb69071a6628130cb78 for the exit trampoline and three other trampolines that had this bug. I also reviewed all the other trampolines and I don't think they were affected. This will be out in 0.5.12.

aantron commented 1 year ago

For my own future needs, this was the program I used to reproduce the bug:

let () =
  Printexc.record_backtrace true;

  Eio_luv.run begin fun env ->
    ignore env;
    let rec repeat = function
    | 0 ->
      ()
    | n ->
      if n mod 100 = 0 then begin
        Printf.eprintf "%i\n%!" n
      end;
      Eio.Switch.run begin fun switch ->
        Eio_luv.Low_level.Process.spawn ~sw:switch "/usr/bin/ls" ["-l"]
        |> Eio_luv.Low_level.Process.await_exit
        |> ignore
      end;
      repeat (n - 1)
    in
    repeat 10000
  end

It may be wrong in the arguments to ls, but this was not significant.