lmdb: Support integer flags (MDB_INTEGERKEY/MDB_INTEGERDUP)

bmatsuo commented 9 years ago

I would like to attempt support for MDB_INTEGERKEY and MDB_INTEGERDUP (See mdb_dbi_open). Any user attempting benefit from them would be in tricky "unsafe" territory without more direct support. But the two flags actually open a wide variety of database schemata (and query structure).

The API extension to support the flags needs be thought out carefully.

[x] Mock implementation and benchmark link

bmatsuo commented 9 years ago

Comment imported from my #1 regarding design considerations

Portability issues? Can databases be ported across os/arch? Portability across gc+gcc, gc+clang, gccgo?

Potential GC issues. A Go reference (C.uint, C.size_t) to the integer value must be held until a Put operation completes if the goal is to avoid allocating memory with C, copying the value, and freeing it after. This puts limitations on how the feature is implemented.

Potential API bloat. There are several combinations of input/output parameter types for keys and values. The cleanest solution would probably one that exposes the mdbVal type, but you get into potential gc issues.

bmatsuo commented 9 years ago

One solution may be to expose an interface and one(?) variant of *.Put that use interface values.

type Value interface {
    MemAddr() unsafe.Pointer
    MemSize() uintptr
}

func Bytes(b []byte) Value
func Uint(u uint) Value
func Uintptr(u uintptr) Value

func (txn *Txn) PutValue(DBI, Value, Value, uint)
func (cur *Cursor) PutValue(Value, Value, uint)

~~This can seemingly allow for strings to be written without being converted to []byte, which is not required but is an interesting bonus.~~ (edit: it does not work for easily strings as well. taking the address of a string's bytes is non-trivial)

Questions:

How useful is a variant of the interface for use with MDB_MULTIPLE?

type ValueMulti interface {
    Value
    // MemStride divides MemSize evenly and with the Multiple op indicates that
    // the value is a page of stride-sized items.
    MemStride() uintptr
}

func BytesMulti(b []byte, stride int) ValueMulti

func (cur *Cursor) PutValueMulti(DBI, Value, ValueMulti)

Edit: In addition to Put variants a Get variant must also be included.

Edit: In addition there could potentially be UintMulti and UintptrMulti functions for batching integer values, but they would require being copied into []C.uint and []C.size_t respectively so any benefits would be less pronounced than in C.

bmatsuo commented 9 years ago

My branch, bmatsuo/value-interfaces experiments with the interface proposed above. However, the results are not encouraging. Writing byte slices containing big-endian encoded uint64 values seems to be as good or better than writing size_t values. Writing unsigned int values was consistently faster, but the difference in performance is close to negligible (< 200ns per Put operation, ~17%). When you go on to consider that Put operation time is small compared to txn commit time the benefits are not clear.

The following table shows the speed of different Put operations, PutValue with Uint (u), PutValue with Uintptr (z), and Put with []byte (b). The uppercase benchmarks U, Z, and B are "optimized" versions of their lowercase counterparts which do not allocate in excess.

BenchmarkTxn_Put_u_-4                1000000          1144 ns/op
BenchmarkTxn_Put_z_-4                1000000          1205 ns/op
BenchmarkTxn_Put_b_-4                1000000          1311 ns/op
BenchmarkTxn_Put_U_-4                1000000          1164 ns/op
BenchmarkTxn_Put_Z_-4                1000000          1188 ns/op
BenchmarkTxn_Put_B_-4                1000000          1182 ns/op

lmb commented 8 years ago

Hi Bryan,

It seems to me that you are mostly considering write speed for this issue. I have a use case that I think could benefit from INTEGERKEY on the read side.

Specifically I want to store update logs for transactions in LMDB itself. Seems like a DBI with INTEGERKEY and APPEND when doing the put would give me fast writes (only one comparison with APPEND anyways) and faster scan when reading.

Re portability of LMDB: did you ever look into that?

bmatsuo commented 8 years ago

@lmd Thanks for the input. Honestly I hadn't really considered the benefits for a read-heavy workload.

Do you have a benchmark program that I look at and maybe use to help validate the performance? A C program would be ok if that is what you have.

Can you tell me about the read access pattern you had in mind? I suppose the performance benefits of the optimization would be ideal with heavy random access. Large range scans may benefit too, but it seems like there are fewer total key comparisons per read in that case unless I'm mistaken.

As far as portability goes, I haven't thought too much about it. I don't know the internals very well. It would be possible to write code that will compile on all supported architectures without os/arch dependent code. So portability in that sense would not be a problem. But in terms of shipping code/data to different machines, I'm not sure. If the libc implementation is the same then I think it should be OK. =\

edit: I don't think shipping the data between different os/arch combinations is safe. I think you could only ship data between two linux/amd64 machines, or two linux/arm machines.

lmb commented 8 years ago

I don't have code unfortunately, still thinking about the whole issue. Found this from hyc: https://gist.github.com/hyc/9810a14488c71e58038c (note that INTEGERKEY has only 31 of N keys found, not sure what's going on).

Seems like there is a nice speed up for the 10m keys case (~23 comparisons per insertion / lookup).

I was thinking that maybe it's possible to ship between amd64 and arm64, haven't gotten around to testing that. If it works it's a hack and relies on struct alignment... (Maybe this should be a separate issue?)

mranney commented 8 years ago

Put me on the list for wanting integer keys, specifically for a read heavy workload that often includes ranges.

Portability of the database file is not a concern for me.

bmatsuo commented 8 years ago

Thanks for the input @mranney. Indeed portability is not a show stopper for this issue.

I have gotten the branch with this experimentation working again, bmatsuo/value-interfaces. The tests and benchmarks are passing. Please go ahead and try it out if there is a particular dataset or access pattern would like to benchmark.

I will work on putting together some read-heavy benchmarks of my own.

The relevant bits are the new Value type and related functions for that type (including UintValue() and UintptrValue()) and the new Txn and Cursor functions PutValue().

Edit: I also just added the Txn.GetValue() and Cursor.GetValue() functions which allow you to perform reads using integers.

bmatsuo commented 8 years ago

@lmb I'm a little confused about what I'm seeing benchmarked ~~here~~(edit: in that gist). Do you have any more details about the source of these tests/numbers, like a mailing list thread? That (31 of 1000000 found) bit is unsettling. But without more background I'm not sure what to take away from that, really.

bmatsuo commented 8 years ago

I realized that the experimental raft backend I have worked on uses a database with uint64 keys. On a 32-bit machine LMDB cannot store a 64-bit integer key/value (so a naive implementation isn't portable, in fact).

But I figured I would take a shot switching it to lmdb.IntegerKey to see how performance changed.

https://github.com/bmatsuo/raft-mdb/commit/6bddf6c5eed5867670987ba718da5115137c148f

In my measurements the numbers really didn't change. That is disappointing. Note that in order to get things to work you need to merge the value-interface branch with a tenative fix I have, from #57.

benchmark                           old ns/op     new ns/op     delta
BenchmarkMDBStore_FirstIndex-4      1907          1899          -0.42%
BenchmarkMDBStore_LastIndex-4       1901          1899          -0.11%
BenchmarkMDBStore_GetLog-4          3966          3933          -0.83%
BenchmarkMDBStore_StoreLog-4        8563          8574          +0.13%
BenchmarkMDBStore_StoreLogs-4       18673         18583         -0.48%
BenchmarkMDBStore_DeleteRange-4     13692         13746         +0.39%
BenchmarkMDBStore_Set-4             3769          3777          +0.21%
BenchmarkMDBStore_Get-4             1318          1316          -0.15%
BenchmarkMDBStore_SetUint64-4       4500          4499          -0.02%
BenchmarkMDBStore_GetUint64-4       1334          1338          +0.30%

I have added some benchmarks for Txn.GetValue and they look somewhat promising (ignoring the above for now). You see essentially the same gains in wall time duration as the Txn.PutValue benchmarks. But reads don't have as much overhead for initialization and termination, so the overall benefit should appear greater. It is just that (now remembering the above table) the benefits are not visible.

So I am still on the fence about how much real benefit a Go program can see from this change. I will try to write a few more benchmarks. But I am worried that everything gets washed out by overhead in the library and in cgo (which is increased now with the builtin argument checking).

But, I have seen the pattern employed by my raft-mdb fork elsewhere for db interface implementations. It is a potentially read heavy application but each read is executed in a different read transaction (hyperbolic but true in seemingly many cases). I think this pattern may be responsible for washing out any performance change.

bmatsuo commented 8 years ago

I've become aware of a development that is related to all of this. https://github.com/golang/go/issues/6907

It looks like it will be possible to get a pointer to the raw bytes backing a string in go1.7. I remarked on the inability to do this earlier in the thread here.

The ability to push strings into LMDB would be pretty big (particularly for keys, PutReserve essentially solved the problem for string values already).

Along with optimizations that @lmb has been kicking ass helping out with, I think the interface approach here is looking more attractive.

bmatsuo commented 7 years ago

The issue I mentioned before, golang/go#6907 didn't make it into go1.7. By the look of things I have doubts that it will make the go1.8 release. However, go1.8beta1 was just release significant [improvements|https://beta.golang.org/doc/go1.8#performance] have been made to cgo. The reduced overhead may make this feature finally worth implementing/maintaining.

My initial benchmarks from the branch bmatsuo/value-interface are promising. The benefit to reads seems the most significant. The speedup of uint values over bytes is most clear. I will continue to monitor the situation as the go1.8 release cycle continues.

The following compares the bmatsuo/value-interface benchmarks on go1.7.4 vs go1.8beta1.

benchmark                              old ns/op     new ns/op     delta
BenchmarkEnv_ReaderList-4              53635         46841         -12.67%
BenchmarkTxn_GetValue_U_raw-4          464           370           -20.26%
BenchmarkTxn_GetValue_Z_raw-4          469           377           -19.62%
BenchmarkTxn_GetValue_B_raw-4          468           382           -18.38%
BenchmarkTxn_GetValue_U_-4             494           414           -16.19%
BenchmarkTxn_GetValue_Z_-4             510           430           -15.69%
BenchmarkTxn_GetValue_B_-4             514           438           -14.79%
BenchmarkTxn_PutValue_u_-4             508           431           -15.16%
BenchmarkTxn_PutValue_z_-4             513           436           -15.01%
BenchmarkTxn_PutValue_b_-4             548           464           -15.33%
BenchmarkTxn_PutValue_U_-4             506           414           -18.18%
BenchmarkTxn_PutValue_Z_-4             515           424           -17.67%
BenchmarkTxn_PutValue_B_-4             516           420           -18.60%
BenchmarkTxn_Put-4                     1538          1463          -4.88%
BenchmarkTxn_PutValue-4                1687          1585          -6.05%
BenchmarkTxn_PutReserve-4              1563          1504          -3.77%
BenchmarkTxn_PutReserve_writemap-4     1297          1206          -7.02%
BenchmarkTxn_Put_writemap-4            1301          1179          -9.38%
BenchmarkTxn_Get_ro-4                  1260          1114          -11.59%
BenchmarkTxn_Get_raw_ro-4              750           666           -11.20%
BenchmarkScan_ro-4                     4838932       4083100       -15.62%
BenchmarkScan_raw_ro-4                 1401681       1046018       -25.37%
BenchmarkCursor-4                      680           475           -30.15%
BenchmarkCursor_Renew-4                182           99.3          -45.44%
BenchmarkErrno_Error-4                 1132          840           -25.80%
BenchmarkTxn_Sub_commit-4              209971        189372        -9.81%
BenchmarkTxn_Sub_abort-4               205281        192064        -6.44%
BenchmarkTxn_abort-4                   14309         14204         -0.73%
BenchmarkTxn_commit-4                  14853         14012         -5.66%
BenchmarkTxn_ro-4                      159661        149648        -6.27%
BenchmarkTxn_unmanaged_abort-4         14537         14241         -2.04%
BenchmarkTxn_unmanaged_commit-4        14694         14130         -3.84%
BenchmarkTxn_unmanaged_ro-4            152787        162579        +6.41%
BenchmarkTxn_renew-4                   526           324           -38.40%
BenchmarkTxn_Put_append-4              393           301           -23.41%
BenchmarkTxn_Put_append_noflag-4       539           449           -16.70%

bmatsuo commented 7 years ago

I am coming very far along in my implementation of this enhancement. My current implementation shows the consistent performance improvements you would expect from this optimization. I probably won't get the code merged until after go1.8 has been released. But that will probably be good because of the cgo optimization it will contain.

The GitHub branch bmatsuo/values-interfaces is up to date if anyone wanted to see it.

bmatsuo commented 7 years ago

@mranney @lmb I am planning on merging support for unsigned integer values this weekend (approx 2017-02-19). If wanted to take a look at #99 and give any feedback I'd appreciate it (obviously, I don't mind if you don't want to or don't have time). I just wanted to give fair warning since this has been something in the works for a while.

lmb commented 7 years ago

Sorry for the late reply, do you have a high level description what the new API would look like? It's quite fiddly to go through all the commits.

bmatsuo commented 7 years ago

Forgive me, @lmb. The proposed API has indeed changed significantly since its first inception, and the branch carries that lineage with it. The change is also rather significant in size, which adds to its unwieldiness. I wish there was an easy way to generate a diff at the godoc level. That would make illustrating the changes somewhat easier.

There is a high-level example that basically gives an overview of the new API features. Perhaps that would be one of the most useful things to look at.

https://github.com/bmatsuo/lmdb-go/pull/99/files#diff-61105a4fb9b9f0a342528e599475968fR165

To support IntegerKey and IntegerDup I defined two types, CUint and CSizet, which are implemented as byte arrays. The byte array implementation means its easy to construct slices which can be passed to Put, Get, ...

Because the normal Get and Put methods are still used I had to define functions to extract unsigned integers from their return values. To do this I added an set of functions ValueTT which are discussed in the main godoc

https://github.com/bmatsuo/lmdb-go/pull/99/files#diff-afbc8465ec7e9f5f191d2ac1ca79f551R84

I also added two minor types, MultiCUint and MultiCSizet, which allow convenient use of unsigned integers with the Cursor.PutMultiple method.

bmatsuo / lmdb-go

lmdb: Support integer flags (MDB_INTEGERKEY/MDB_INTEGERDUP) #11