Open phraktle opened 8 years ago
Btw, this would also eliminate the need to use Unsafe (which I see has been causing problems for some), since you can return ByteBuffers wrappers for a native address: http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewDirectByteBuffer
I agree that first resort should always be the JDK.
But ByteBuffer is well known to have both api bugs and internal performance bugs that are not prioritized by Oracle. ByteBuffers is simply not the default anymore and that's why people create new implementations. Until Oracle start taking these issues seriously i'm very much against using ByteBuffers! I agree the situation is not ideal. But this is the state of Java today and not specific to lmdbjni.
I'm not sure how much users would gain from getting freedom to choose their own buffer implementation? What problems specifically is it that you see? If we have bugs we should fix them.
As you probably know there is a lot of ongoing debate on Oracle deprecating Unsafe in Java 9. Some of this will be replaced by VarHandles (among others) http://openjdk.java.net/jeps/193. And I hear rumors that new "high-performance" JSRs might be created to solve the Unsafe situation.
Everything is in a state of flux at the moment and it's still a bit early to know how a standard Java zero copy solution would look like in the future.
Agree with the problems of ByteBuffer, however, the convenience APIs mentioned above are also just wrappers around these in general (and/or use Unsafe, which is also not the most portable/future-proof approach). So providing the base LMDB API using ByteBuffers would introduce no new issues and would still allow one to use any further buffer APIs.
Can you please elaborate exactly what your proposal is? If understand correctly you want to deprecate and replace Entry
with ByteBuffer
on most methods on Database
in order to abstract away direct vs heap access to LMDB? There may be problems with this but maybe I don't understand your proposal?
My proposal is simply to use ByteBuffer in the LMDB API, where you currently use DirectBuffer (and move DirectBuffer outside the LMDB API)
What are the advantages from doing that?
If ByteBuffer isn't your buffer of choice then you would be penalized with two buffer conversions.
I would ask the same for DirectBuffer ;) Many of our existing serialization etc is implemented against ByteBuffers. ByteBuffers work with all other Java APIs, e.g. NIO, and the aforementioned buffer libraries. And you can actually wrap a ByteBuffer with any of these without copying (or extra overhead on operations), but you can't always get a ByteBuffer from the current DirectBuffer in lmdbjni (when it just accesses a memory address directly with Unsafe, you can't get a ByteBuffer w/o copying).
So, I would rather not (re)write code against an lmdbjni-specifc buffer API, when there's one in the JDK that's widely supported (I agree about your comments about how it's not a great design – but that's besides the point).
Yes. I'm not saying that we shouldn't do it. It's an interesting use case. My main concern is that performance will get worse and use cases for DirectBuffer harder to implement. Maybe a prototype is appropriate that proves otherwise?
Is it enough for DirectBuffer.byteBuffer()
to wrap the ByteBuffer around the memory address? This is required anyway. You would only pay for the construction of a DirectBuffer object.
If you can guarantee the DirectBuffer to always have a wrapped ByteBuffer, then why not just have the lmdbjni API use ByteBuffers everywhere?
Doing this lazily impose no penalty when only using DirectBuffer. The penalty for using ByteBuffer include hacking into internals of ByteBuffer using reflection.
Hello @phraktle , how do we wrap mmap memory from LMDB with ByteBuffer ? The only way ByteBuffer works with off heap memory is ByteBuffer.allocateDirect. Or we probably can create new mmap using MappedByteBuffer - which will require having two MMaps for one file. But it's api requires us to load whole buffer into memory, or otherwise exception will happen An attempt to access an inaccessible region of a mapped byte buffer will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time.
(I especially like this "some later time" :+1: ) The beauty of working with MMaped data is that pages required will be loaded automatically when needed. I'm new to JVM world, so maybe I'm missing something. Looking forward to read your opinion.
@kk00ss by using NewDirectByteBuffer in the native code, which will wrap an LMDB region.
Then I support @phraktle request, there are various APIs that know how to work with ByteBuffer and it's often hard to change to work with custom buffer implementation. I think this change is rather trivial, unfortunately I cannot configure environment to build lmdbjni locally (windows 10 and sdk for it). And since @krisskross wrote that build for win64 was performed on virtual machine. It would be awesome to have shared windows image with all required stuff. Not sure about licencing for that virtual machine however.
I agree that ByteBuffer is a more portable solution and worth comparing numbers with DirectBuffer.
I'm not sure how you would hook into hawtjni to make that JNI call though? Does hawtjni support it or do we need a fully custom JNI implementation?
On first glance, I see no evidence that HawtJNI supports this. As a matter of fact, HawtJNI looks like an unmaintained project (no commits in a year, no reactions on tickets/pull requests).
Yeah that's my impression as well. Maybe it's possible to take the generated code and do it manually. But I think the best bet is to wait for Project Panama and JEP 191: Foreign Function Interface which is probably both safer and higher performance.
FYI: We are almost finished with an implementation using JNR-FFI and benchmarks are looking very good. Give it a spin. Any comments/opinions are greatly appreciated.
Have a look at the benchmarks.
Hi,
I have started playing a bit with the zero-copy APIs and my initial impression is that it should use straight ByteBuffers, rather than the custom wrapper DirectBuffer. A library like this should not be opinionated on which convenience wrapper one uses for buffers.
(There are actually several such options and one should be free to make a choice – or use none of these: https://github.com/OpenHFT/Chronicle-Bytes#comparison-of-access-to-native-memory)
Regards, Viktor