Closed benalexau closed 8 years ago
Travis has passed the Java 8 test, but failed the Java 7 tests due the third-party SBE tool being compiled with Java 8.
As Java 7 was end of life in April 2015 I would respectfully suggest it is reasonable to discontinue the JDK 7 CI tests.
I think replacing DirectBuffer
with Agrona buffers is a great idea. Better to have good integration with a library like Agrona than nothing at all. There was discussions earlier about using standard ByteBuffer
but this idea haven't been pursued AFAIK.
I do have some concerns on dropping Java 7 support, mostly for sake of Android. However, Android N seems to have limited support for Java 8 and there seems to be some support for Android 6.0 (API level 23) as well. I don't have any Android experience so I don't know how this works in practice.
On the other hand, zero copy with DirectBuffer
shouldn't be used on Android since we haven't figured out how to do it without manual JNI hacking yet.
If anyone use lmdbjni for Android I would like know if this is concern or not. Also if there are any users on Java 7 still.
Some minor comments on the code.
Buffers
class seems unnecessary? There are not a lot of places where MutableDirectBuffer is instantiated in production code. Maybe add a utility method creating buffers only for tests? The Buffers.slice
method could be added to BufferCursor
instead?
Further to my earlier comment on CI failing due to a Java 8 dependency in the SBE Tool, upon further investigation the Java 8 requirement also extends to Agrona itself. I had hoped that it was isolated to the SBE Tool, therefore enabling a tactical workaround such as manually generating the flyweights. However Java 8 is also a requirement of Agrona:
$ javap -verbose -cp ~/.m2/repository/org/agrona/Agrona/0.4.13/Agrona-0.4.13.jar org.agrona.DirectBuffer | grep version
minor version: 0
major version: 52
$ javap -verbose -cp ~/.m2/repository/uk/co/real-logic/sbe-all/1.4.0-RC4/sbe-all-1.4.0-RC4.jar uk.co.real_logic.sbe.SbeTool | grep version
minor version: 0
major version: 52
In relation to the Buffers
class, I would submit that it's worth keeping. Today Agrona's DirectBuffer
and MutableDirectBuffer
interfaces are almost universally satisfied via its UnsafeBuffer
implementation. In the future this may no longer be the case, particularly as Java 9 emerges and/or it becomes necessary to return a different MutableDirectBuffer
based on JVM version and/or platform (eg Android).
As the Buffers
class is a package protected final
class, it is unavailable for end users and a subsequent refactor or removal of the class will not break the public contract.
The main benefit of Buffers
is it currently encapsulates every UnsafeBuffer
reference in the PR:
$ find . -name \*.java -exec grep -H UnsafeBuffer {} \;
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java:import org.agrona.concurrent.UnsafeBuffer;
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java: return new UnsafeBuffer(new byte[0]);
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java: return new UnsafeBuffer(ByteBuffer.allocateDirect(capacity));
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java: return new UnsafeBuffer(source.addressOffset(), length);
It is called from some 48 call sites:
$ find . -name \*.java -exec grep -H Buffers.buffer {} \; | wc -l
48
As such we now have a single location if conditional logic is required to select an alternate implementation in the future. Of more minor benefit, we also have reduced the length of lines that would have needed to manually instantiate UnsafeBuffer
themselves. We have also guaranteed consistency of buffer types, specifically that a to-be-used-for-wrapping buffer is initially backed by an inexpensive byte[0]
, and key buffers are of the Env.MAX_KEY_SIZE
test-verified length. I believe these collectively enhance the readability of the code.
Neverthess you are the maintainer, so I can switch back to creating UnsafeBuffer
s at the call sites if you wish. I just wanted to advocate why I did it in the first place, as there are dozens of call sites.
Regarding the POM indentation, I have fixed this up and squashed it into the original commit for further review.
Sorry for the late reply. I have been on vacation for a week. I agree that the Buffers
class doesn't matter much since its package protected. I'm ready to merge the PR since there hasn't been any objections so far. This also means moving lmdbjni to Java 8. We might get objections later but I feel we can handle with separate branching.
There was one thing. Can we shade this dependency or is it a bad thing? I know this technique will be less relevant as Java 9 emerge but I'm curious about the general opinion.
Agrona has had 20 releases in 16 months (see here and here). Any application that uses Agrona (eg Aeron, SBE) will probably want to use the latest Agrona. If LMDBJNI shades Agrona, this will complicate the classpath ordering of such applications.
The two benefits I can see for shading are:
Personally I prefer to keep things simple and idiomatic by not shading the dependency.
I maintain that using a ByteBuffer is the way to go (see #42), instead of relying on unsafe operations. It would keep the library light-weight, more portable and future-proof. Would also work with existing serialization code using ByteBuffer APIs, and would not stop anyone from using their favorite buffer wrappers, such as Agrona and many others out there. The opposite is not true. The only difficulty seems to be that HawtJNI does not generate native code that returns ByteBuffers.
@phraktle I agree in principle (I'd still like to benchmark the result to ensure there is no regression though), but do you know if someone is working on making HawtJNI support this?
Opened a ticket here: https://github.com/fusesource/hawtjni/issues/25. In the meantime, perhaps a reasonable hack would be to create DirectByteBuffers from the pointer on the Java level (rather than in JNI). This does require reflection to make address
and capacity
accessible but arguably this is less problematic than using Unsafe.
I think this change make sense irrespective if we change to ByteBuffer some time in the future. But maybe it's time I look into this issue. @phraktle: have you looked into how difficult it would be to implement that in HawtJNI?
It's a bit unfortunate that the library grows with 224K though, at the same time, users sensitive to footprint would indeed probably solve that one way or another. I agree with @benalexau comment on shading (so let's not do that).
I'll do a benchmark for the ByteBuffer reflection trick this coming week and see how it performs against DirectBuffer.
Are we opposed to abandoning HawtJNI in favour of raw JNI or another JNI helper library?
A few months ago I did a quick (and encouraging) experiment accessing LMDB via JNAerator-emitted BridJ bindings. I only used this combination as I had recent experience with them and an internal C++ library.
I feel hand-crafted JNI or a more minimalist C-focused JNI binding generator (eg Gluegen) might be worth exploration, particularly given LMDB is such a small and stable API.
I did some manual JNI hacking some time ago but that was a nightmare. I haven't looked closer at other JNI code generating tools. But since lmdbjni is quite mature I think our best bet is to solve it in HawtJNI.
I haven't looked into how difficult it would be to do this in HawtJNI. Agree that manual JNI is too painful. Alternatives to HawtJNI could work also. Hopefully there will be a better FFI for Java: http://openjdk.java.net/jeps/191
In the meantime JNR-FFI looks pretty solid.
JNR-FFI has a nice API but the performance seems so-so.
http://stackoverflow.com/questions/22288909/jni-vs-jna-performance
That link is about JNA, not JNR-FFI (https://github.com/jnr/jnr-ffi)
Sorry, I confused the two. Couldn't find any benchmarks so I posted a question on their issue tracker.
JavaCPP looks quite nice also, well maintained, lots of examples/documentation and promise "zero overhead compared to manually coded JNI functions".
JavaCPP also appears to use NewDirectByteBuffer
(see Generator.java). @phraktle does it look like this would work to you?
I'm not familiar with JavaCPP, but as long as it can generate the JNI code for the LMDB wrapper easily and supports returning ByteBuffers, it should do the trick :) The reason why I mentioned JNR is that it seems widely used and might be the basis of JEP191 / Project Panama.
I have been playing around with JavaCPP for a short while. Looks promising.
I still think it would be nice to merge this PR given it would provide a foundation for benchmarking the latest improvements to DirectBuffer
against a ByteBuffer
-based approach. In my view the single biggest competitive differentiation of LMDBJNI is it’s the lowest latency off-heap sorted map we can access from Java. After all, there are plenty of pure Java off-heap and on-heap sorted and unsorted maps for those who aren’t as latency sensitive. I believe LMDBJNI should continue to prioritise its latency differentiation for those users who depend on that key attribute.
To this end, LMDBJNI currently uses MutableDirectBuffer
to wrap a different memory location as the cursor pointer moves. This does not require any additional object allocation or reflection (see source code). If I understand the NewDirectByteBuffer
approach, JNI will allocate a new ByteBuffer
instance on each cursor move. A user of the new ByteBuffer
instance will then either use it directly (with the associated bounds checking of a ByteBuffer
) or separately present it to a helper library that detects the ByteBuffer
concrete class, reflectively accesses its private memory address, and uses Unsafe
to get/set bytes without bounds checking overhead. To the extent this is correct, each cursor move would mean an extra ByteBuffer
object allocation plus either bounds checking or reflection to acquire a memory address.
Such latency might border on immaterial versus the JNI call overhead and practical LMDB use cases, but I believe it needs to be benchmarked with a large enough workload to verify the GC pressure, reflection cost and bounds checking overhead. If the latency is of borderline difference I obviously prefer the cleaner, standards-oriented ByteBuffer
approach. I just don’t want to pay a latency cost for the common use case of a single transaction making numerous cursor moves and making use of the fetched data.
Separately I am also working on an LMDB DAO that persists SBE types via a Java annotation processor. SBE Pull Request 350 added some code generation features to support this work, so I am keen to finish this off and make it available for others. The annotation processor generated code is simpler and slightly fewer operations if I don't need to obtain the existing LMDBJNI DirectBuffer
address and then wrap it into an Agrona [Mutable]DirectBuffer
for the SBE use.
Is @krisskross trying out porting to JavaCPP? Can I help out? Is there a NewDirectByteBuffer
approach that would negate the latency concern?
Yes, I agree that the primary concern for LMDBJNI is performance. I'm also a bit worried about that extra allocation. The manual tests I did before indicated that it might hurt performance. I will not spend time rewriting LMDBJNI if the performance is worse than what we have today.
Right now I'm just playing around with JavaCPP to get a feel for the work needed and it seems relatively straight forward. My C/C++ knowledge is a little shakey so any help in this area would sure be helpful.
Cool initiative there with SBE! I was also doing something similar a while back in a project called Graphene. Ill have a closer look at the PR.
I will merge the PR, in the meantime. Sorry for the detour.
If it turns out that the overhead of the ByteBuffer wrapper allocation is indeed an issue, the reflective trick would still allow you to move the pointer without allocating a new one.
Yes, I'll explore that option aswell.
BTW, thanks for the PR @benalexau!
@krisskross you're welcome. How did you want to tackle the JavaCPP piece? A branch here or in your own fork or something else? Regardless of whether we eventually end up with ByteBuffer
or DirectBuffer
(again I prefer ByteBuffer
if the latency for large workloads ends up materially equivalent), it seems desirable to modernise around a more popular and maintained JNI helper (JavaCPP has 1,532 stars vs HawtJNI's 50) and remove Unsafe
from LMDBJNI if we can.
Stars is indeed how we measure stuff nowadays, right? ;-) No, but seriously HawtJNI is mostly abandoned so I agree that we should move away from it. Looks like JavaCPP makes it easy to access memory through both address and ByteBuffer.
Right now I just play around in a private repository. I can create a public one if you're interested to have a look?
Stars aren't everything, but in this case there's 30 times as many in one project vs the other, not to mention a commit log with similar polarisation. Given most of us aren't terribly interested in JNI and the Java 9 Unsafe
dance is going to wipe out many convenient tricks, there's some safety in numbers....
Happy to take a look at your private repo or a public one. I just don't want to duplicate work you have already done. It'd also be good to get informal feedback along the way, rather than a huge PR at the end.
Cool, I'll create a repo later tonight and let you know when it's up.
Got a reply from Charles Nutter regarding performance of JNR-FFI.
We have done some testing for presentations, but not a formal suite of benchmarks to compare with raw JNI or other libraries like JNA. In our testing, however, the overall cost of making a simple native call is about what it is for hand-written JNI...significantly faster than JNA. As you start to use more advanced features, they add overhead too...but in most cases we (really, @wmeissner, the founder of the JNR projects) try to keep that overhead about what you'd have to accept to do the same features with JNI.
This PR adopts Agrona's
DirectBuffer
, rather than an early fork ofDirectBuffer
. Adopting Agrona yields several benefits:DirectBuffer
(read-only) andMutableDirectBuffer
(read/write) interfaces, which have the normal benefits of separating interfaces from implementations, plus exposing the most minimal API for each use caseThis PR includes:
DirectBuffer
Env
class, including tests of that defaultBuffers
utility class to centraliseMutableDirectBuffer
instantiation