deephacks / lmdbjni

LMDB for Java
Apache License 2.0
204 stars 28 forks source link

Add more SeekOps #17

Closed dieselpoint closed 9 years ago

dieselpoint commented 9 years ago

org.fusesource.lmdbjni.SeekOp includes only two SeekOps, KEY and RANGE.

Lmdb supports a ton more in JNI.java. In particular, MDB_FIRST would be helpful so we can seek to the first record. But there's no reason not to add the rest of them.

I would do this myself and do a pull request, but I still can't do a build in Eclipse.

krisskross commented 9 years ago

Sounds like a good improvement.

I tried to open the project in Eclipse and the project doesn't look very happy so I fixed these just now.

The only thing I don't understand is that hawtjni plugin 'build' goal is not recognized, not sure what this is about? Anything more that messes up the Eclipse build?

krisskross commented 9 years ago

Eclipse provde quickfixes that makes the plugin errors go away but it tries to add e2 specific stuff to the poms, which is a bad idea. Maybe try configure default mappings in eclipse without using poms?

https://www.eclipse.org/m2e/documentation/m2e-execution-not-covered.html

dieselpoint commented 9 years ago

I added the m2e specific stuff to the poms and they worked. It's ugly, but not fatal.

On 1/23/2015 1:11 PM, Kristoffer Sjögren wrote:

Eclipse provde quickfixes that makes the plugin errors go away but it tries to add e2 specific stuff to the poms, which is a bad idea. Maybe try configure default mappings in eclipse without using poms?

https://www.eclipse.org/m2e/documentation/m2e-execution-not-covered.html

— Reply to this email directly or view it on GitHub https://github.com/deephacks/lmdbjni/issues/17#issuecomment-71247456.

Chris Cleveland Dieselpoint, Inc. +1 773.528.1700 x116 +1 312.339.2677 mobile http://dieselpoint.com ccleveland@dieselpoint.com

This email and any attachments contain information from Dieselpoint, Inc. and should be considered confidential. If this email is received in error, please delete it and notify the sender.

ccleve commented 9 years ago

It seems that both .seek() and .get() hit the same mdb_cursor_get() function. Perhaps using GetOp is basically equivalent?

ccleve commented 9 years ago

Also, on the poms, there was a bunch of version errors. Some places the version was 0.3.1 and other places 0.1.3. I made them all 0.3.1 and everything compiled.

krisskross commented 9 years ago

It's important to stay close to the LMDB API and there is a little bit of mix up right now. MDB_cursor_op is not consistently exposed in Cursor since there is seek and get, but I think all options are there, no?

Do you any suggestions on how to improve it?

ccleve commented 9 years ago

My best suggestion is to duplicate the LevelDB API. Look at org.iq80.leveldb.DBIterator. It supports .seekToFirst(), etc.

krisskross commented 9 years ago

Yes, that's indeed more user friendly, but not quite complete in terms of functionality.

I was actually thinking of modernizing the whole API (as a separate artifact), but haven't found time to do it yet. Also not sure if this would useful?

ccleve commented 9 years ago

If you've got the time to do it, by all means...

I think there are some advantages to having a BerkeleyDB / LevelDB / RocksDB compatibility layer. Makes it much easier for people to adopt.

I also think that having some kind of zero-copy API would be helpful. If Lmdb or some other Java-based db can surface some MappedByteBuffers then the performance will be much better. It's hard to do because the ByteBuffer interface is so awful, but I've tried to do it by creating my own wrappers. I've had limited success. It looks like the Lmdbjni code tries to do the same thing with its DirectBuffer.

krisskross commented 9 years ago

I agree. Maybe i'll spend some time on it soon.

The zero copy should be working quite well. Have you tried it? According to my tests it's almost 4 times faster than buffer copy for cursor scans, and a lot faster than the JNI versions of rocksdb and leveldb.

https://github.com/deephacks/lmdbjni/blob/master/lmdbjni/src/test/java/org/fusesource/lmdbjni/PerfTest2.java

krisskross commented 9 years ago

Here is a summary of a JMH test I just did. As you can see, the zero copy is A LOT faster than anything else.

Benchmark                    Mode  Cnt         Score         Error  Units
Iteration.leveldb           thrpt   10   7624941.049 ±  995999.362  ops/s
Iteration.lmdb_buffer_copy  thrpt   10   3066605.928 ±  610793.399  ops/s
Iteration.lmdb_zero_copy    thrpt   10  15029604.092 ± 1309367.614  ops/s
Iteration.rocksdb           thrpt   10   1505814.770 ±  420279.355  ops/s

Here is the full report if you're interested: http://pastebin.com/gPFVcakL

krisskross commented 9 years ago

BTW you may want to have a look at https://github.com/deephacks/graphene which is built on lmdbjni and provide a higher level API similar to BerkleyDB.

It is Java 8 lambda friendly, uses the javac compiler for generating entity and builders classes with fast custom serialization and has a simple query language. It's still very experimental but works for basic stuff.

Let me know what you think.

ccleve commented 9 years ago

Looks interesting. I'll poke around.

Here's one that I wrote: https://github.com/dieselpoint/norm

krisskross commented 9 years ago

Let me know what you think of these additions.

krisskross commented 9 years ago

I have work a bit on the BufferCursor API in order to make it safer and easier to use. You no longer need to juggle DirectBuffers which is nice and it's waaay faster than the regular buffer copy API.