Closed madroach closed 4 years ago
In all common OCaml key-value implementations, put
/add
acts like overwrite. Why should lmdb be different ?
In my opinion overwriting by default is more prone to subtle bugs. Unintentionally overwriting instead of adding can cause errors far away from the actual modification. Also raising an error on unnoticed key-collisions can reveal false assertions.
But anyway. If this is differing from familiar interfaces I agree with you. But actually I tried to mimic Stdlib.hashtbl, by allowing to add a value to an existing key for duplicate maps.
I just had a look at leveldb and kyotocabinet interfaces. LevelDB does what you say and only provides overwrite functionality. KyotoCabinet has a more differentiated interface providing overwriting set
, non-overwriting add
as well as replace
, which may fail with Not_found
.
OCaml's hashtbl and map both add by replacing. In theory, Hashtbl contains duplicate, but it doesn't really allow to observe it, until you remove the old duplicates (or you iterate through), and in particular you can't get the list of multiple values, so the semantics is closer to overwrite in practice.
I think out of all of those, I prefer the KyotoCabinet one, with "shadowing" set
(just like OCaml's hashtbl and map), your current version of add
, and replace
(which you currently call overwrite
).
If we give all that, we should hide the overwrite
flag.
I need to write down what you are proposing.
add
(put
in this PR)key | dup | no dup |
---|---|---|
exists | insert additional value | raise Exists |
free | insert new key | insert new key |
put ~flags:no_overwrite
key | dup | no dup |
---|---|---|
exists | raise Exists | raise Exists |
|
free | insert new key | insert new key |
overwrite
as in this PRkey | dup | no dup |
---|---|---|
exists | n/a | overwrite |
free | n/a | insert new key |
replace
raising Not_found
on non-existing keykey | dup | no dup |
---|---|---|
exists | delete all existing values, insert new value | overwrite |
free | raise Not_found | raise `Not_found |
I don't think we need to offer replace
. The difference to overwrite
is
subtle and in case you really need to fail with Not_found
its trivial to implement:
Cursor.go Rw map begin fun cursor ->
Cursor.get cursor |> ignore;
Cursor.replace cursor value;
end
while dup replace
would be implemented like this:
Cursor.go Rw map begin fun cursor ->
Cursor.get cursor |> ignore;
Cursor.remove cursor ~all:true;
Cursor.put cursor key value;
end
set
Shadowingset
is not possible with lmdb, because it doesn't store the order
in which values are added to a key. get
will always return the smallest
value of a key. So the put
implementation with sorted duplicates I propose
in this PR is the closest we will get to shadowing semantics.
After all I think the put
and overwrite
functions I propose in this PR are adequate. You would like to rename put
→ add
?
lmdb is special in that it's maps come in dup_sorted / no_dup flavors. So part of the semantics are decided at map creation time. the no_overwrite
flag has a very misleading name for the dup_sorted case.
set
seems to be a better name for overwrite
, since it may also insert a new key.
Ok, I mostly agree with you, I vote for this:
add
(used to be called put
in this PR)key | dup | no dup |
---|---|---|
exists | insert additional value | raise Exists |
free | insert new key | insert new key |
set
(used to call overwrite
in this PR)key | dup | no dup |
---|---|---|
exists | delete all existing values, insert new value | overwrite |
free | insert new key | insert new key |
It simply makes set
/overwrite
available on dup databases.
I implemented your proposal.
Because tests were failing again, I restructured them and tried to reduce the mmap size. Maybe those unpredictable failures were due to some memory limit reached in certain conditions?
merge?
This changes the api in that now
Exists
on non-duplicate maps.overwrite
functions are added. They will only work on non-duplicate maps. Overwriting makes no sense on duplicate maps.