deephacks / lmdbjni

LMDB for Java
Apache License 2.0
204 stars 28 forks source link

Update BufferCursor to support Android or document why not #21

Open harningt opened 9 years ago

harningt commented 9 years ago

It sounds to me like BufferCursor would be great for Android, but I cannot seem to find a document reason why this would not work... just a statement not to do it.

If it is impossible due to current Android limitations, having it documented what they are would be useful to keep a watch out for when Android may support the necessary feature. If it is non-performant due to runtime issues (either Dalvik, ART, or both) documentation would be good here as well.

krisskross commented 9 years ago

That "not supported" statement is a bit harsh. I encountered a problem during the first Android release related to some missing methods on Unsafe in order to get memory offsets and did not have time to fix it. I think it's possible, just my Android skills is not where they should.

I'll have a second look this week.

harningt commented 9 years ago

Ah yes - BufferCursor sits atop DirectBuffer which uses Unsafe. Doh! Perhaps an alternate 'Unsafe' could be used via JNI... though looking at DIrectBuffer - either a new implementation class would be necessary using minimal calls, or the alternate Unsafe would be quite large.

krisskross commented 9 years ago

Yeah, but we're not without hope. There is very little information on this elsewhere but Android does support direct ByteBuffer so there is probably a way of managing the memory address and wrap that into something that BufferCursor can use.

I just need to setup a test environment so I can try some of these ideas.

krisskross commented 9 years ago

MemoryBlock seems to be the key. Yeah, I suspect we can make BufferCursor work on Android, unless there are runtime/platform restrictions i'm not aware of.

harningt commented 9 years ago

Awesome, it also looks like it has been in its current location since Honeycomb, looking through the tree, 2.3.6 had similar features using org.apache.harmony.luni.platform.PlatformAddress

krisskross commented 9 years ago

That's good to know.

krisskross commented 9 years ago

I got things running on Android and managed to get hold the correct memory address of a MemoryBlock. I just got to figure out the memory layout so that LMDB reads from the correct address.

I searched around and couldn't really find much useful information on this. Do you know where I might find it? I'll have a look in the Android source in the meantime.

harningt commented 9 years ago

Looks like for extracting values from LMDB or writing into a buffer prepared by LMDB (via MDB_RESERVE), you'd be using MemoryBlock.wrapFromJni... then for constructing new ones for which you want the pointer, you'd do MemoryBlock.allocate (which also exposes the byte[] for direct writing easily through the object's 'MemoryBlock::array' function)

wrapFromJni objects don't expose a raw byte[] to work at, you'd have to use the poke/peek operations there. whereas MemoryBlock.allocate (suppose primarily useful for preparing keys to begin queries or values to write in w/o MDB_RESERVE) offers up a byte[] primitive (alongside poke/peek)

The key method of both of these is to access MemoryBlock::toLong method which return's the address, MemoryBlock::getSize for total size, and MemoryBlock::free method to mark the block freed... for blocks acquired via wrapFromJni, it just nulls out the address... so anything acquired would need to be appropriately released LMDB side ...though if I recall right, most things are directly from the MMAP and aren't malloced... so it would be a useful item to call to mark the block 'dead' and unusable... preventing bad pokes/etc.

Sorry for the rambling and any missing logic - did some research and poking as I was constructing this "mind-dump".

Looking at your DirectBuffer implementation, it looks like to use MemoryBlock, you'd be 'out' the ability to wrap byte[] easily... and do more direct modifications. libcore.io.Memory looks like it is even closer to 'UNSAFE' but I'd suspect this is an even more unstable API than MemoryBlock.

Thinking this through, MemoryBlock is really just the guts underneath ByteBuffer for Android, too bad there isn't a good API for DirectByteBuffer to access unsafe memory... though I think there is a JNI method, but lifetime management is probably the big issue there -> what to do on 'free'

krisskross commented 9 years ago

Nice analysis, it is inline with my conclusion as well. I think this is pretty close to working but some little detail is wrong. This is what I tried with direct ByteBuffers.

ByteBuffer keyBuf = ByteBuffer.allocateDirect(8); // zero filled
ByteBuffer valBuf = ByteBuffer.allocateDirect(8); // zero filled
// getAddress == MemoryBlock::toLong
long keyAddr = getAddress(ByteBuffer.allocateDirect(8));
long valAddr = getAddress(ByteBuffer.allocateDirect(8));
// pokeLong == Memory.pokeLong
pokeLong(keyAddr, 8, true);
pokeLong(keyAddr + 8, getAddress(keyBuf), true);
pokeLong(valAddr, 8, true);
pokeLong(valAddr + 8, getAddress(valBuf), true);
int flags = 0;
int rc = JNI.mdb_put_address(tx.pointer(), db.pointer(), keyAddr, valAddr, flags);
checkErrorCode(rc);

Unfortunately this generates MDB_BAD_VALSIZE so I thought that maybe the byte ordering was incorrect, but changing it with Long.reverseBytes doesn't help. Using RESERVE flag also generates MDB_BAD_VALSIZE. JNI.mdb_cursor_put_address behavior is identical to JNI.mdb_put_address.

Any ideas?

BTW, do you know where I can find a libcore jar to put on classpath? Can't find them in any jar in Android Studio. Right now i'm relying on reflection to call internal classes.

harningt commented 9 years ago

In most cases Android runs 32-bit... so using 8 doesn't seem right as the offset.

Even if 64-bit case: you are allocating 8 bytes for the keyAddress and later poking 8 bytes after it.

keyBuf, looking at APIs, should probably be MDB_val size which if Java longs are addresses, then should be 16 bytes long for both size and data ptrs.

Another nice trick you could use to avoid too many extra buffers. Construct the following sort of struct (assuming 64-bit... on 32-bit, change all putLongs => putInt): byte[] inputValue = new byte[3]; ByteBuffer keyBuf = ByteBuffer.allocateDirect(2 * 8 + inputValue.length) long keyAddr = getAddress(keyBuf); pokeLong(keyAddr, inputValue.length, true); pokeLong(keyAddr + 8, keyAddr + 2 * 8, true); /* Copy over inputValue to keyAddr + 2 * 8 */

Basically you stuff the data in the same struct allocation at a given offset.

Though of course you run into issues with MDB_RESERVE, which in that case you will have to extract the address out of the valAddr and write to that address.

Regarding libcore, I imagine that this is completely inaccessible :-(

Maybe pulling in roboelectric into the build might get you the desired symbols... they have an android_libcore jar file... but I'm not sure how that would work out with taking that and stuffing it on a real android device.

harningt commented 9 years ago

I also imagine that this would be an issue with PC Java+LMDB, but it is a much more likely scenario that the user is using 64-bit than with Android.

krisskross commented 9 years ago

Thanks for the feedback. Actually, Howard answered a question on OpenLDAP mailing list today regarding 32-bit.

I am philosophically opposed to supporting 32 bit architectures in LMDB. http://www.openldap.org/lists/openldap-devel/201405/msg00014.html

However, for the few people who insist, there is an experimental branch available that can support larger DBs on 32 bit CPUs, by unmapping and remapping segments of the DB.

https://gitorious.org/mdb/mdb/source/69d7cb8d44e04f02d8d0c923ae71fbaaa9f42f3a:

Due to the added system calls involved, it is significantly slower. We have plans to merge it into LMDB 1.0 as an optional feature, but for the moment it's only in that branch.

Fundamentally, supporting this feature in LMDB is the wrong thing to do. It adds code/bloat and slows down the overall codebase just in order to support a dying technology. Even in embedded processing 64 bit CPUs are available cheaply now; anyone still pouring money into 32 bit CPUs needs to have their head examined.

So I think 32-bit is a pretty shaky road to take right now? The lmdbjni-android-0.3.2 builds with arm-linux-androideabi-4.9 x86_64 as of now.

Anyway, I found this in ByteBuffer.allocateDirect which indicates that there is memory alignment going on.

public static ByteBuffer allocateDirect(int capacity) {
  if (capacity < 0) {
      throw new IllegalArgumentException("capacity < 0: " + capacity);
  }
  // Ensure alignment by 8.
  MemoryBlock memoryBlock = MemoryBlock.allocate(capacity + 7);
  long address = memoryBlock.toLong();
  long alignedAddress = (address + 7) & ~(long)7;
  return new DirectByteBuffer(memoryBlock, capacity, (int)(alignedAddress - address), false, null);
}

I will try your suggestions but I can't work more on this until the weekend so please bear with me. Feel free to try some of this stuff out. I can send you the full code I played with if you're interested.

krisskross commented 9 years ago

The 8 byte offset was a silly mistake, of course it should be 2 * 8 bytes.

I have tried different variations but can't get LMDB to accept mdb_put_address. Also double checked memory is correct by using peekLong and it does AFAICT.

One thing I noticed is that memory addresses are 32-bit length like 1948825548. But address pointers should be treated as 64-bit address (so writing longs is correct)?

Android is currently expected to run on 32-bit platforms. In theory it could be built for a 64-bit system, but that is not a goal at this time. For the most part this isn't something that you will need to worry about when interacting with native code, but it becomes significant if you plan to store pointers to native structures in integer fields in an object. To support architectures that use 64-bit pointers, you need to stash your native pointers in a long field rather than an int.

http://developer.android.com/training/articles/perf-jni.html#64_bit

I'm running out of ideas. I have searched around for projects that might use the same approach but failed to find any.

harningt commented 9 years ago

Storage in Java should be done using 'long' values, as it prevents having two storage types in a project. However the value length changes at the native layer, so there would be putLong(address) in 64-bit and putInt((int)address) in 32-bit to keep size setup... unless the platform uses 64-bit pointer storage even if it can only address 32-bit (haven't seen an architecture like this).

I'd be glad to run some tests on the work-in-progress code to see if I can mangle it to work this coming week.

Regarding the 32-bit/64-bit support, it does make sense that there's issue with 32-bit and large databases. Luckily I have no interest in >= 2 GB databases for which this becomes a major issue. I suppose it starts becoming in issue earlier too due to memory space issue.

krisskross commented 9 years ago

Awesome. I have some ideas left to try as well. Send me a mail if you want the code I have so far.

krisskross commented 9 years ago

Not sure but this might be related to memalign vs posix_memalign.

https://github.com/LMDB/lmdb/commit/a7639a66a493818dc55f3ed77bebe659b6cdd2fd