Drup / ocaml-lmdb

Ocaml bindings for lmdb.
https://drup.github.io/ocaml-lmdb/dev/
MIT License
48 stars 3 forks source link

Make put default to no_overwrite, add separate overwrite function. #38

Closed madroach closed 4 years ago

madroach commented 4 years ago

This changes the api in that now

Drup commented 4 years ago

In all common OCaml key-value implementations, put/add acts like overwrite. Why should lmdb be different ?

madroach commented 4 years ago

In my opinion overwriting by default is more prone to subtle bugs. Unintentionally overwriting instead of adding can cause errors far away from the actual modification. Also raising an error on unnoticed key-collisions can reveal false assertions.

But anyway. If this is differing from familiar interfaces I agree with you. But actually I tried to mimic Stdlib.hashtbl, by allowing to add a value to an existing key for duplicate maps.

I just had a look at leveldb and kyotocabinet interfaces. LevelDB does what you say and only provides overwrite functionality. KyotoCabinet has a more differentiated interface providing overwriting set, non-overwriting add as well as replace, which may fail with Not_found.

Drup commented 4 years ago

OCaml's hashtbl and map both add by replacing. In theory, Hashtbl contains duplicate, but it doesn't really allow to observe it, until you remove the old duplicates (or you iterate through), and in particular you can't get the list of multiple values, so the semantics is closer to overwrite in practice.

I think out of all of those, I prefer the KyotoCabinet one, with "shadowing" set (just like OCaml's hashtbl and map), your current version of add, and replace (which you currently call overwrite).

If we give all that, we should hide the overwrite flag.

madroach commented 4 years ago

I need to write down what you are proposing.

add (put in this PR)

key dup no dup
exists insert additional value raise Exists
free insert new key insert new key

put ~flags:no_overwrite

key dup no dup
exists raise Exists | raise Exists
free insert new key insert new key

overwrite as in this PR

key dup no dup
exists n/a overwrite
free n/a insert new key

replace raising Not_found on non-existing key

key dup no dup
exists delete all existing values, insert new value overwrite
free raise Not_found | raise `Not_found

I don't think we need to offer replace. The difference to overwrite is subtle and in case you really need to fail with Not_found its trivial to implement:

Cursor.go Rw map begin fun cursor ->
  Cursor.get cursor |> ignore;
  Cursor.replace cursor value;
end

while dup replace would be implemented like this:

Cursor.go Rw map begin fun cursor ->
  Cursor.get cursor |> ignore;
  Cursor.remove cursor ~all:true;
  Cursor.put cursor key value;
end

set

Shadowingset is not possible with lmdb, because it doesn't store the order in which values are added to a key. get will always return the smallest value of a key. So the put implementation with sorted duplicates I propose in this PR is the closest we will get to shadowing semantics.

what to include ?

After all I think the put and overwrite functions I propose in this PR are adequate. You would like to rename putadd ?

lmdb is special in that it's maps come in dup_sorted / no_dup flavors. So part of the semantics are decided at map creation time. the no_overwrite flag has a very misleading name for the dup_sorted case.

madroach commented 4 years ago

set seems to be a better name for overwrite, since it may also insert a new key.

Drup commented 4 years ago

Ok, I mostly agree with you, I vote for this:

add (used to be called put in this PR)

key dup no dup
exists insert additional value raise Exists
free insert new key insert new key

set (used to call overwrite in this PR)

key dup no dup
exists delete all existing values, insert new value overwrite
free insert new key insert new key

It simply makes set/overwrite available on dup databases.

madroach commented 4 years ago

I implemented your proposal.

Because tests were failing again, I restructured them and tried to reduce the mmap size. Maybe those unpredictable failures were due to some memory limit reached in certain conditions?

merge?