Make put default to no_overwrite, add separate overwrite function.

madroach commented 4 years ago

This changes the api in that now

put defaults to fail on existing keys with Exists on non-duplicate maps.
on duplicate maps it will add an additional value to the existing key.
to allow overriding an existing key with a new value overwrite functions are added. They will only work on non-duplicate maps. Overwriting makes no sense on duplicate maps.

Drup commented 4 years ago

In all common OCaml key-value implementations, put/add acts like overwrite. Why should lmdb be different ?

madroach commented 4 years ago

In my opinion overwriting by default is more prone to subtle bugs. Unintentionally overwriting instead of adding can cause errors far away from the actual modification. Also raising an error on unnoticed key-collisions can reveal false assertions.

But anyway. If this is differing from familiar interfaces I agree with you. But actually I tried to mimic Stdlib.hashtbl, by allowing to add a value to an existing key for duplicate maps.

I just had a look at leveldb and kyotocabinet interfaces. LevelDB does what you say and only provides overwrite functionality. KyotoCabinet has a more differentiated interface providing overwriting set, non-overwriting add as well as replace, which may fail with Not_found.

Drup commented 4 years ago

OCaml's hashtbl and map both add by replacing. In theory, Hashtbl contains duplicate, but it doesn't really allow to observe it, until you remove the old duplicates (or you iterate through), and in particular you can't get the list of multiple values, so the semantics is closer to overwrite in practice.

I think out of all of those, I prefer the KyotoCabinet one, with "shadowing" set (just like OCaml's hashtbl and map), your current version of add, and replace (which you currently call overwrite).

If we give all that, we should hide the overwrite flag.

madroach commented 4 years ago

I need to write down what you are proposing.

`add` (`put` in this PR)

key	dup	no dup
exists	insert additional value	raise `Exists`
free	insert new key	insert new key

`put ~flags:no_overwrite`

key	dup	no dup
exists	raise `Exists` \| raise `Exists`
free	insert new key	insert new key

`overwrite` as in this PR

key	dup	no dup
exists	n/a	overwrite
free	n/a	insert new key

`replace` raising `Not_found` on non-existing key

key	dup	no dup
exists	delete all existing values, insert new value	overwrite
free	raise `Not_found` \| raise `Not_found

I don't think we need to offer replace. The difference to overwrite is subtle and in case you really need to fail with Not_found its trivial to implement:

Cursor.go Rw map begin fun cursor ->
  Cursor.get cursor |> ignore;
  Cursor.replace cursor value;
end

while dup replace would be implemented like this:

Cursor.go Rw map begin fun cursor ->
  Cursor.get cursor |> ignore;
  Cursor.remove cursor ~all:true;
  Cursor.put cursor key value;
end

`set`

Shadowingset is not possible with lmdb, because it doesn't store the order in which values are added to a key. get will always return the smallest value of a key. So the put implementation with sorted duplicates I propose in this PR is the closest we will get to shadowing semantics.

what to include ?

After all I think the put and overwrite functions I propose in this PR are adequate. You would like to rename put → add ?

lmdb is special in that it's maps come in dup_sorted / no_dup flavors. So part of the semantics are decided at map creation time. the no_overwrite flag has a very misleading name for the dup_sorted case.

madroach commented 4 years ago

set seems to be a better name for overwrite, since it may also insert a new key.

Drup commented 4 years ago

Ok, I mostly agree with you, I vote for this:

`add` (used to be called `put` in this PR)

key	dup	no dup
exists	insert additional value	raise `Exists`
free	insert new key	insert new key

`set` (used to call `overwrite` in this PR)

key	dup	no dup
exists	delete all existing values, insert new value	overwrite
free	insert new key	insert new key

It simply makes set/overwrite available on dup databases.

madroach commented 4 years ago

I implemented your proposal.

Because tests were failing again, I restructured them and tried to reduce the mmap size. Maybe those unpredictable failures were due to some memory limit reached in certain conditions?

merge?

Drup / ocaml-lmdb