Zero-copy usage - Githubissues

kk00ss commented 8 years ago

Hello, how do I allocate memory (DirectBuffer for example) inside of LMDB's MMap file ? Preferably safely :-) As I understand using bufferCursor implies copying data to the map. I assume this requires setting LMDB mmap as writable - is it going to make it non-reliable ?

krisskross commented 8 years ago

There is no support for this today, but it would be a nice addition.

Today, BufferCursor thread local reusable buffer that will be copied by LMDB when written. I can imagine this buffer to be the MDB_RESERVE buffer instead, which would save that last copy. And yes, the the database would need to be writable.

I think we can make it safe :-)

krisskross commented 8 years ago

First stab. I'll take a closer look at BufferCursor also.

krisskross commented 8 years ago

I'm a bit cautious about making MDB_RESERVE the default for all write operations on BufferCursor. Dynamic buffer expansion is handy when the size of the value is unknown, i.e. you won't waste space in the database.

One way forward could be two different types of BufferCursors. Or force users to either reserve or allocate before each write. We should also keep in mind that multiple reserves can be performed within same transaction.

I would like some feedback on how we should proceed.

kk00ss commented 8 years ago

My use case is blocks of a known equal size - so there is no problem with allocations.

krisskross commented 8 years ago

@slaunay Do you have an opinion on this?

slaunay commented 8 years ago

I think that MDB_RESERVE is an advanced low level feature and like you mentioned it does require to know the exact size of the value beforehand so I would not use it as the default way to write with BufferCursor. I also believe that it can not be used to "overwrite" an existing entry so it would make the integration not that transparent.

I am actually wondering if people using that feature would not already manage their own DirectBuffer and if the added complexity to the BufferCursor is worth the effort.

That being said, I'm curious to see the improvement of running zero copy write for values in my application.

krisskross commented 8 years ago

Yeah, I took a stab at making something intuitive but it felt confusing to mix BufferCursor with reserve in the end. So I agree that the complexity might not be worth it.

Unless of course someone else has a good idea of how to implement it.

kk00ss commented 8 years ago

Hello Where can we see that stub? Or any example of zero-copy writes to LMDB? I don't know how to work with MDB_RESERVE for sure, but have some WAGs :

First more practical one - MDB_RESERV can be used for overflow pages and values would only contain pointers to them (of proper LMDB format) . This will only work for values larger than 2KB-16 , (or something like that) this would minimize copying for cases where it is important concern.

krisskross commented 8 years ago

Have a look at this commit which returns a DirectBuffer for the reserved space in the Database.

I'm not sure I follow what you're proposing? Are we still talking about BufferCursor, or MDB_RESERVE in general? Can you please elaborate?

kk00ss commented 8 years ago

I'm sorry for confusing you. I had a misconception of LMDB, it's better thought out than I could imagine. All hail Howard Chu.

I see confusing part of LMDB documentation "MDB_CURRENT - replace the item at the current cursor position. The key parameter must still be provided, and must match it. If using sorted duplicates (MDB_DUPSORT) the data item must still sort into the same place. This is intended to be used when the new data is the same size as the old. Otherwise it will simply perform a delete of the old record followed by an insert."

I've asked Howard Chu and waiting for his answer, but currently I assume that LMDB is copy-on-write B-Tree and update-in-place cannot be enabled by any options except MDB_WRITEMAP - thus there should be no problem with updates via MDB_RESERVE. Update in place might easily cause data loss, avoiding which is one of the points of LMDB as far as I understand.

My current understanding of LMDB is that every write transaction creates new ROOT_NODE that replaces old node after all the changes of this transaction were written into MMAP. So if it fails to do so - we will only see old ROOT_NODE and from that point in time pages written by failed transaction are not dirty - but free.

So I think we can make every put zero copy, we don't need BufferCursor for it (But as far as I understand ordered writes/updates are better - because they require less B-Tree traversal). That would actually be two-phase put - get space, serialize your data to that buffer. We can create new interface ZeroCopyable which would require Values user want to store in LMDB to have to methods - Int getSerializedSize() and void serialize(buf: DirectBuffer)

I've got myself local copy of lmdbjni sources - I've tried loading it with profiles win64 and full and for some reason tests cannot be initialized - "Please build lmdbjni first with a platform specific profile". Which of course I've already done.

krisskross commented 8 years ago

I'm pretty sure Howard knows what he's doing, so i'm not going to comment :-)

So I think we can make every put zero copy, we don't need BufferCursor for it (But as far as I understand ordered writes/updates are better - because they require less B-Tree traversal).

Great, sounds inline with our earlier conclusion.

That would actually be two-phase put - get space, serialize your data to that buffer. We can create new interface ZeroCopyable which would require Values user want to store in LMDB to have to methods - Int getSerializedSize() and void serialize(buf: DirectBuffer)

Do we need another interface? The reserve() method takes a DirectBuffer key and a size, returning the reserved space as a DirectBuffer to the user.

I've got myself local copy of lmdbjni sources - I've tried loading it with profiles win64 and full and for some reason tests cannot be initialized - "Please build lmdbjni first with a platform specific profile". Which of course I've already done.

Building for windows is a bit tricky. There are instructions on the wiki. Let me know if following the instruction doesn't work.

kk00ss commented 8 years ago

Response from Howard - Nothing is updated in place. That reserve method is just fine :+1: When writing data using BufferCursor (and Cursor.put internally) lmdbjni first writes data to some NativeBuffer (which is not even DirectBuffer for some reason ??) - can we use DirectBuffer obtained via MDB_RESERVE everywhere instead of NativeBuffer ? It would be -1 copying of the same data. I see no disadvantages of using MDB_RESERVE for buffering operation performed on BufferCursor. Probably I'm missing something. One other idea is that would be great if DirectBuffer could be passed around instead of ByteBuffer (if it was derifed from it, or there was another version of it) - but I see that API is very different. About ZeroCopyable I thought that it will be more convenient to use single call to put(ByteBuffer key, ZeroCopyable value) that will internally 1) get the size estimate from an object,2) call reserve and then 3) serialize data directly into DirectBuffer. If you don't see any added value in this proposal I'll play with this myself.

interface ZeroCopyable {
Int getSizeEstimate()
void writeTo(DirectBuffer buf)
}

krisskross commented 8 years ago

When writing data using BufferCursor (and Cursor.put internally) lmdbjni first writes data to some NativeBuffer (which is not even DirectBuffer for some reason ??)

There's already a Cursor.put() method which takes DirectBuffer. This is also how BufferCursor puts data into the database. Yes, LMDB will copy that data during commit, but notice that BufferCursor reuses valueByteBuffer for different keys and only resize/copy it when the buffer runs out of space. So I expect multiple puts to be almost as efficient as MDB_RESERVE in same transaction.

I see no disadvantages of using MDB_RESERVE for buffering operation performed on BufferCursor.

The disadvantage is that you must know size of value beforehand or risk wasting space in the database.

One other idea is that would be great if DirectBuffer could be passed around instead of ByteBuffer (if it was derifed from it, or there was another version of it) - but I see that API is very different.

ByteBuffer is only used to allocate memory and not exposed directly in the BufferCursor API - even tough you can access it from BufferCursor.valDirectBuffer().byteBuffer().

I thought that it will be more convenient to use single call to put(ByteBuffer key, ZeroCopyable value) that will internally 1) get the size estimate from an object,2) call reserve and then 3) serialize data directly into DirectBuffer. If you don't see any added value in this proposal I'll play with this myself.

I'll have another look at BufferCursor.

deephacks / lmdbjni

Zero-copy usage #51