deephacks / lmdbjni

LMDB for Java
Apache License 2.0
204 stars 28 forks source link

Use ByteBuffer in zero copy APIs (instead of DirectBuffer) #42

Open phraktle opened 8 years ago

phraktle commented 8 years ago

Hi,

I have started playing a bit with the zero-copy APIs and my initial impression is that it should use straight ByteBuffers, rather than the custom wrapper DirectBuffer. A library like this should not be opinionated on which convenience wrapper one uses for buffers.

(There are actually several such options and one should be free to make a choice – or use none of these: https://github.com/OpenHFT/Chronicle-Bytes#comparison-of-access-to-native-memory)

Regards, Viktor

phraktle commented 8 years ago

Btw, this would also eliminate the need to use Unsafe (which I see has been causing problems for some), since you can return ByteBuffers wrappers for a native address: http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewDirectByteBuffer

krisskross commented 8 years ago

I agree that first resort should always be the JDK.

But ByteBuffer is well known to have both api bugs and internal performance bugs that are not prioritized by Oracle. ByteBuffers is simply not the default anymore and that's why people create new implementations. Until Oracle start taking these issues seriously i'm very much against using ByteBuffers! I agree the situation is not ideal. But this is the state of Java today and not specific to lmdbjni.

I'm not sure how much users would gain from getting freedom to choose their own buffer implementation? What problems specifically is it that you see? If we have bugs we should fix them.

As you probably know there is a lot of ongoing debate on Oracle deprecating Unsafe in Java 9. Some of this will be replaced by VarHandles (among others) http://openjdk.java.net/jeps/193. And I hear rumors that new "high-performance" JSRs might be created to solve the Unsafe situation.

Everything is in a state of flux at the moment and it's still a bit early to know how a standard Java zero copy solution would look like in the future.

krisskross commented 8 years ago

Tip of the iceberg.

phraktle commented 8 years ago

Agree with the problems of ByteBuffer, however, the convenience APIs mentioned above are also just wrappers around these in general (and/or use Unsafe, which is also not the most portable/future-proof approach). So providing the base LMDB API using ByteBuffers would introduce no new issues and would still allow one to use any further buffer APIs.

krisskross commented 8 years ago

Can you please elaborate exactly what your proposal is? If understand correctly you want to deprecate and replace Entry with ByteBuffer on most methods on Database in order to abstract away direct vs heap access to LMDB? There may be problems with this but maybe I don't understand your proposal?

phraktle commented 8 years ago

My proposal is simply to use ByteBuffer in the LMDB API, where you currently use DirectBuffer (and move DirectBuffer outside the LMDB API)

krisskross commented 8 years ago

What are the advantages from doing that?

krisskross commented 8 years ago

If ByteBuffer isn't your buffer of choice then you would be penalized with two buffer conversions.

phraktle commented 8 years ago

I would ask the same for DirectBuffer ;) Many of our existing serialization etc is implemented against ByteBuffers. ByteBuffers work with all other Java APIs, e.g. NIO, and the aforementioned buffer libraries. And you can actually wrap a ByteBuffer with any of these without copying (or extra overhead on operations), but you can't always get a ByteBuffer from the current DirectBuffer in lmdbjni (when it just accesses a memory address directly with Unsafe, you can't get a ByteBuffer w/o copying).

So, I would rather not (re)write code against an lmdbjni-specifc buffer API, when there's one in the JDK that's widely supported (I agree about your comments about how it's not a great design – but that's besides the point).

krisskross commented 8 years ago

Yes. I'm not saying that we shouldn't do it. It's an interesting use case. My main concern is that performance will get worse and use cases for DirectBuffer harder to implement. Maybe a prototype is appropriate that proves otherwise?

krisskross commented 8 years ago

Is it enough for DirectBuffer.byteBuffer() to wrap the ByteBuffer around the memory address? This is required anyway. You would only pay for the construction of a DirectBuffer object.

phraktle commented 8 years ago

If you can guarantee the DirectBuffer to always have a wrapped ByteBuffer, then why not just have the lmdbjni API use ByteBuffers everywhere?

krisskross commented 8 years ago

Doing this lazily impose no penalty when only using DirectBuffer. The penalty for using ByteBuffer include hacking into internals of ByteBuffer using reflection.

kk00ss commented 8 years ago

Hello @phraktle , how do we wrap mmap memory from LMDB with ByteBuffer ? The only way ByteBuffer works with off heap memory is ByteBuffer.allocateDirect. Or we probably can create new mmap using MappedByteBuffer - which will require having two MMaps for one file. But it's api requires us to load whole buffer into memory, or otherwise exception will happen An attempt to access an inaccessible region of a mapped byte buffer will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time. (I especially like this "some later time" :+1: ) The beauty of working with MMaped data is that pages required will be loaded automatically when needed. I'm new to JVM world, so maybe I'm missing something. Looking forward to read your opinion.

phraktle commented 8 years ago

@kk00ss by using NewDirectByteBuffer in the native code, which will wrap an LMDB region.

kk00ss commented 8 years ago

Then I support @phraktle request, there are various APIs that know how to work with ByteBuffer and it's often hard to change to work with custom buffer implementation. I think this change is rather trivial, unfortunately I cannot configure environment to build lmdbjni locally (windows 10 and sdk for it). And since @krisskross wrote that build for win64 was performed on virtual machine. It would be awesome to have shared windows image with all required stuff. Not sure about licencing for that virtual machine however.

krisskross commented 8 years ago

I agree that ByteBuffer is a more portable solution and worth comparing numbers with DirectBuffer.

I'm not sure how you would hook into hawtjni to make that JNI call though? Does hawtjni support it or do we need a fully custom JNI implementation?

phraktle commented 8 years ago

On first glance, I see no evidence that HawtJNI supports this. As a matter of fact, HawtJNI looks like an unmaintained project (no commits in a year, no reactions on tickets/pull requests).

krisskross commented 8 years ago

Yeah that's my impression as well. Maybe it's possible to take the generated code and do it manually. But I think the best bet is to wait for Project Panama and JEP 191: Foreign Function Interface which is probably both safer and higher performance.

krisskross commented 8 years ago

FYI: We are almost finished with an implementation using JNR-FFI and benchmarks are looking very good. Give it a spin. Any comments/opinions are greatly appreciated.

https://github.com/lmdbjava/lmdbjava

krisskross commented 8 years ago

Have a look at the benchmarks.

https://github.com/lmdbjava/benchmarks