Drup / ocaml-lmdb

Ocaml bindings for lmdb.
https://drup.github.io/ocaml-lmdb/dev/
MIT License
48 stars 3 forks source link

Lmdb.Values.db_val is opaque #7

Closed madroach closed 5 years ago

madroach commented 5 years ago

Lmdb.Values.db_val is opaque and thus no module of type Lmdb.Values.S can be constructed.

Drup commented 5 years ago

Right, so I was not really sure which API to provide. db_val is a pointer underneath, and is pretty much impossible to manipulate safely without knowing a lot about ctypes (you really really don't have the right to alias that one).

You can still implement new modules respecting Values.S by using the Int and String instances!

If you have a better proposal, I'm all ears. Maybe by going through a CArray with lot's of flashing red warning about memory management ?

madroach commented 5 years ago

I never used ctypes, but it should be possible to wrap the MDB_val in a bigarray. Of course with a warning about lmdb's memory-mapped nature (its in the name, isn't it?). Exposing the pointer, preferably as bigarray in Lmdb.Values.db_val is not that bad since this will only be used when one wants to build a custom module which somehow converts from/to ocaml values. Just add a warning that no data may be passed by reference. I think it may be sensible to also support MDB_RESERVE with an interface like this:

type db_val =
  (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout)
  Bigarray.Array1.t

module type Value.S = sig
  type t 

  val default_flags : Lmdb.Values.Flags.t

  val read : db_val -> t
  (* read: buf
   * deserialise data returned from the database.
   * buf is memory mapped into the database file
   * and is valid only until the next update.
   * therefore you must not return the data by reference.
   * you need to copy the data. *)

  val write : t -> (length:int -> db_val) -> ()
  (* write v f
   * serialise data to be stored in the database.
   * Function f should be called with the length of the
   * serialised data and will return a bigarray of according size
   * in which the serialised data can be stored.
   * write should return when the serialised data is ready.
   * The bigarray must not be modified after write returned *)
end

mdb_put will then be called in the callback from write, of course.

Do you think bigarrays will add too much overhead?

Drup commented 5 years ago

Right, that doesn't seem unreasonable. bigarray certainly adds some overhead, but it would need to be measured. Maybe Ctypes.carray is more lightweight (it's quite similar).

Your API definitely doesn't work in the general case (see strings, the lenght is dynamic). Even for MDB_RESERVE, I think it doesn't work because the point is that the writer receives a pointer to the structure, and writes there. You never returns an mdb_val.