A buffer data type - Githubissues

mknejp commented 9 years ago

The current way the "data" type works is only marginally useful for data streaming between languages due to all the copying involved.

I was thinking about introducing a "buffer" data type that represents memory shared between both sides of the fence without allocating and copying stuff around in every call. Both Objective-C and Java have facilities to access "unmanaged" regions of memory.

From C++ to
- Java: a direct java.nio.ByteBuffer which can be created in JNI with NewDirectByteBuffer() and does not copy the content.
- Objective-C: NSData dataWithBytesNoCopy:length:freeWhenDone: same as above (**)
From Java to C++: Pass in a java.nio.ByteBuffer and use GetDirectBufferAddress() and GetDirectBufferCapacity() to transform into something like std::experimental::array_view<uint8_t>.
From Objective-C to Java: Pass in a NSMutableData and use .length and .mutableBytes to construct the array view.

The whole point of this exercise is to avoid copying the buffer content and is intended for long-lived buffers that are shared and written to/read from on both sides to exchange bulk data. There of course must be some sort of agreement in the interface protocol about who creates the data and make sure it is not modified in a way that invalidates the memory region.

On a related note, maybe the "data" datatype should also switch to something like std::experimental::array_view<const uint8_t> to avoid the copy at least in one direction where possible.

\ The only drawback here is that NSData is read-only. If mutable access is necessary the buffer has to be created on the Objective-C side with NSMutableData.

pwais commented 9 years ago

For C++ -> Java: If the Java method returns an nio.ByteBuffer via NewDirectByteBuffer(), the JVM won't free the underlying C++-allocated buffer when the nio.ByteBuffer gets GC'ed. Djinni would probably need to provide a subclass of ByteBuffer or some sort of custom wrapper that calls free() or a C++ deleter on finalize().

For Java -> C++: Note sure if array_view will get ratified... I would recommend a small custom structure similar to a capnproto / kj array: https://github.com/sandstorm-io/capnproto/blob/master/c%2B%2B/src/kj/array.h#L128 . Something like kj::Array would be small, concise, and largely compatible with existing libc++ utilities.

For C++->Objective-C: With respect to the issue of mutability, this looks like a case for an special djinni ObjC class (as in the C++ -> Java case where we need a ByteBuffer that will dispose on GC). If the special ObjC class is simply a pointer, size, and function pointer to a void disposer(), then it should be largely interoperable with Core Foundation.

The underlying problem here is that ownership of the byte buffer must be transferred across the language border; a shared byte buffer is inherently a (pointer, size, disposer) tuple and not just (pointer, size). It might make more sense for djinni to include its own simple ByteBuffer for each language to achieve the necessary ownership transfer. While this additional data structure would increase complexity of djinni's UI, the existing binary type is a solid solution for the majority of use cases where performance demands are relatively flexible.

One last thought for the interim: the user can pass pointers across the language boundary via the i64 type. For Java <-> C++, I know at least JNA has facilities for mapping pointer addresses.

mknejp commented 9 years ago

Whether it's array_view or something else doesn't matter if it can be configured just as the current setting for optional, as long as it can be constructed from pointer/size arguments and takes a single template parameter.

I am not trying to make ownership or lifetime implicit. The user has to decide who owns the buffer. The user has to know that changing the size/capacity of the buffer by anyone may make the memory region invalid for all involved parties and has to be respecified. This is an attempt at providing either

An argument to a function that is sufficiently large that copying it twice takes significant time and is redundant. The contract of the function should specify that the memory region is only valid for the duration of the call. This use case might be eliminated for if we could get rid of the temporary std::vector in the thunks.
A persistent region of memory that is available for read/write access from both parties. Here lifetime and invalidation has to be managed by the user explicitly. It is the user's responsibility to not access a java.nio.ByteBuffer or NSData if the corresponding std::vector no longer exists or was reallocated.

I am not opposed to a solution with implicit lifetime management if it can be done properly and safely.

I guess a custom ObjC type derived rom NSData is acceptable since NSMutableData always copies the content in its initializers.

pwais commented 9 years ago

Another issue not yet discussed for Java <-> C++ is that Java ByteBuffers are by default Big Endian and ObjC/C++ users typically expects Little Endian data. This is mostly a user-facing issue and should be rare (and the buffer probably has an endian-aware r/w protocol 1 2). It might make sense to force native order, where necessary, as JNA does.

I also just noticed that djinni interface functions can accept interfaces as parameter types. (This feature demonstrated in the example code but not the root README example). While it would indeed be nice to have a djinni ByteBuffer datatype, perhaps one simply needs a byte buffer interface (that might also include methods to address ownership transfer, if any)?

For example:

my_native_byte_buffer = interface +j +o {
  allocate(size_bytes: i64);  # Allocate space for this many bytes
  begin(): i64;               # Return address of first byte
  size(): i64;                # Return size of the buffer
  disown();                   # Release the buffer, but don't delete it;
                              # assume the user now owns the memory at
                              # begin() of size size()
}

my_buffer_writer = interface +j {
  create(): my_native_byte_buffer;
}

my_buffer_reader = interface +c {
  read(buffer: my_native_byte_buffer);
}

I note that unfortunately djinni won't compile the IDL if my_native_byte_buffer is marked as having a +c implementation (an assertion error triggers; not sure why).

In ObjC, getting a pointer address for begin() shouldn't be too hard. In Java, one probably needs to call into JNI (derp!). For direct byte buffers, there's GetDirectBufferAddress, and for byte arrays, there's GetByteArrayElements, but that call might do a copy. FWIW JNA has a simple facility to get direct buffer addresses but not one for non-direct byte buffers (e.g. byte[]). My guess is JNA doesn't handle non-direct byte buffers because 1) the user has to release the jbyte* so that the GC is free to e.g. move the byte[] upon a compaction 2) the GetByteArrayElements() might trigger a copy anyways, so the pointer address doesn't have much value.

While there are problems with this approach, it might be best for most users since it forces them to define how ownership works and to define what creates the pointer (e.g. mmapped file buffer? network buffer? non-direct buffers probably can't be shared even tho they can be ByteBuffer-wrapped). Furthermore, use of large Java direct byte buffers might require special JVM args (e.g. -XX:MaxDirectMemorySize) and special tuning so that the JVM leaves space for native heap.

pwais commented 9 years ago

Thinking on this point a bit more, there are a handful of tricky issues here:

Java -> C++: GetPrimitiveArrayCritical() requires no JNI calls until a ReleasePrimitiveArrayCritical(), so the user must code intelligently to use this special feature. Otherwise buffers get deep-copied (so no better than current behavior of binary).
C++ -> Java: Assuming the ByteBuffer is direct, a separate call into C++ must happen to free the C++-allocated buffer. Either the user must put this call in their djinni interface (very undesirable) or Java must get a ByteBuffer subclass that calls a C++ deleter on finalize() (do-able).
C++ -> ObjC : NSMutableData responds to initWithBytesNoCopy but will just deep-copy the buffer and delete it immediately. If NSData owns a buffer, it must be allocated using malloc.

djinni mainly offers two features:

records are pass-by-value, always deep-copied, stateless, and marshaled between languages
interfaces are pass-by-reference, stateful, and are never marshaled between languages

What we really need is a union of these two features:

custom translation/marshaling of ownership semantics
stateful (if we assume the underlying buffer gets ref-counted)

Perhaps a solution here is a feature for (expert) user-defined primitives:

User can declare a symbol for their primitive for use in IDL files (and thus code generation)
User specifies language-native class for each language that supports the primitive. The native class can be user-provided, i.e. does not need to be part of a standard library
User must implement {to,from}{languages} marshaling code (similar to the existing {to,from}Java JNI code in support-lib).
User benefits from all of djinni's existing support code (e.g. exception handling stuff).

Using this feature, we could do as much as provide our own ByteBuffer and as little as give C++ a (potentially) zero-copy view of a nio.ByteBuffer.

Anybody put much thought into user-defined primitives before?

mknejp commented 9 years ago

Anybody put much thought into user-defined primitives before?

It's something I'm working on to enable #52 but it needs some not-so-subtle changes to Djinni to be properly supported. It may also be helpful for #45 at some point.

Regarding the the type discussed here: Maybe you are trying to solve too many problems at once. What I envision (for starters) is a way to exchange a persistent area of memory to which both sides have read/write access. Someone has to create it and someone needs to be responsible for destroying it. I think the lifetime of such a heavyweight objet should be managed explicitly and depend on the use case.

pwais commented 9 years ago

What I envision (for starters) is a way to exchange a persistent area of memory to which both sides have read/write access.

For Java <-> C++, one would need to use a DirectByteBuffer (since the JVM can copy heap buffers as it sees fit, and once it does a copy you might as well just use djinni's existing solution). The use case I have in mind is to give C++ direct r/w access to heap buffers (or direct buffers), and it appears some sort of (admittedly non-trivial) adapter class is necessary. I agree this latter use case is more complicated, but a lot of libraries have solved these JNI-related problems and it would be nice to distill those solutions into a djinni feature.

+1 to #52 as a solution to this issue!

pwais commented 9 years ago

Now that https://github.com/dropbox/djinni/pull/95 hit master, is this issue closed? Thanks @mknejp !!!

mknejp commented 9 years ago

I suppose, unless such a type should be provided as part of Djinni's "standard library"

j4cbo commented 9 years ago

I do think it should be provided by Djinni (either fully built-in or by way of #95's mechanism, not sure yet), so let's leave this open for now.

pwais commented 9 years ago

My vote would be to have #95 fulfill the ByteBuffer issue; at least my intention is to use that mechanism. java.nio.ByteBuffer isn't necessarily the best solution-- ByteBuffers still have a garbage collected component. A user could realistically want to leverage sun.misc.Unsafe instead to minimize GC pauses.

pwais commented 9 years ago

@mknejp did you ever poke much farther on this? I'm curious if you ended up implementing anything for Java <=> C++.

I dug into this a bit further with the presumption that a buffer is a (pointer, size, deleter) tuple. The deleter addition specifically handles ownership-related issues for arena-allocated memory, mmap-ed files, and other buffers that need special cleanup. I believe this model covers all possible use cases.

Based upon the discussion below, I think a single zero-copy buffer record type would not fulfill all needs; I think most use cases actually call for an interface. Nevertheless, there appear to be some proper practices that could work there way into Djinni support code (if not as a part of an IDL primitive).

GC-Unmanaged Access

Note that if the user simply wants to share r/w access to a buffer and does not intend to move or share ownership, then a buffer can simply be a (pointer, size) tuple. Unmanaged pointers are almost completely portable; if the JVM is 32-bit and the host is 64-bit, one needs to worry about sign extension. NB: java.nio.ByteBuffer capacities are int-sized, but sun.misc.Unsafe allows allocating blocks of memory larger than Integer.MAX_VALUE; thus each of (pointer, size) should be 64-bit. Therefore, the simplest way to achieve a buffer type would be to define a record with members i64 address and i64 size. If the user means to invoke an interface upon the buffer frequently (e.g. in a loop), it would be more performant (but slightly uglier) to omit the record and pass (i64 address, i64 size) as parameters.

GC-Managed Access

For exposing managed memory (e.g. byte[]s) to native code, JNI's GetPrimitiveArrayCritical() might work but can block GC (as it does in Hotspot). It seems that this API is really meant for immediately copying data to/from a device, e.g. as Android does in these results. Due to these restrictions, a (potentially) zero-copy managed buffer might warrant a completely unique object (record or interface) in user code.

Moving and Sharing ownership to GC-Unmanaged Memory

In the case that the user does want to move and/or share ownership, a deleter is necessary and significantly complicates the problem. Without loss of generality, we can assume a deleter is either a Java Runnable or a C++ callable (std::function<void()>) that holds the buffer address (and/or other data/references) as state.

DirectByteBuffers offer some direction as to how have the JVM handle foreign deleters properly. Some JVMs leverage a somewhat peculiar Cleaner utility to wrap a native free()-calling thunk; the GC interops with cleaners directly. However, the Android JVM is a bit different and uses MemoryBlock#finalize() to free native memory (i.e. there is no thunk). It's important to note that DirectByteBuffer does not have a field for a deleter on all JVMs.

Thus for shared buffers, Djinni would need to internally maintain a list of deleters (or perhaps just the buffers themselves) and free memory only once buffers become unreferenced from both sides. (I believe there's similar existing code to deal with interface instances). For buffers that only move one way, the issue is still complicated (see below). It looks like a JNI call upon buffer termination is a necessity; moving buffers in a loop would have high overhead.

Java => C++

For DirectByteBuffers, Djinni needs to maintain a reference to the DirectByteBuffer instance. Once the C++ proxy dies, Djinni can drop the DirectByteBuffer reference and the GC will reclaim the memory normally.

For user-managed off-heap memory (e.g. memory allocated via sun.misc.Unsafe), Djinni needs to invoke a user Runnable once the C++ proxy dies. The user Runnable might invoke Unsafe.freeMemory() or might have other behavior (e.g. if the buffer resides within a larger arena). NB: Unsafe.freeMemory() invokes a JVM-dependent os::free() method, so user C++ code cannot safely just call free() on Unsafe-allocated memory.

C++ => Java

Djinni would need to call a user std::function<void()> cleanup method once the proxy object finalize() is called.

pwais commented 9 years ago

FWIW, am hacking on the C++ <=> Java part of this in https://github.com/pwais/djinni/tree/pwais_perf2 (NOT in preview state yet) with the goal of supporting at least {byte[], (direct) ByteBuffer, Unsafe} <=> C++ using user-defined types. I think it would make sense to exist in /extension-libs if included in djinni at all. The goal is not to add a core djinni buffer type (as I don't see a great solution in that approach) but to alleviate the user of having to worry about JNI through a few custom array types.

jcampbell05 commented 8 years ago

Any progress ?

pwais commented 8 years ago

Really want to finish the C++ <-> Java work I had started earlier (linked above), but have been inundated with other things. In that change, JHeapArrayHandle-inl.hpp has all the bits for byte[] <-> void , ByteArrayHandle-inl.hpp for ByteBuffer <-> void , and JUnsafeArrayHandle-inl.hpp for sun.misc.Unsafe <-> void *. Some of the code is mid-refactor, so please pardon all the dust if you go digging, but I believe I had all the JNI calls correct.

If you just want direct ByteBuffer <-> (void *, size_t), here's what I'd recommend:

ByteBuffer -> (void *, size_t): Just use the GetDirectBufferAddress() JNI API
(void *, size_t) -> ByteBuffer: Note that ByteBuffer.allocateDirect() is (AFAICT) the only portable way to create a Java-owned direct byte buffer; NewDirectByteBufer() creates a native-owned fascade. I recommend using JNI to invoke ByteBuffer.allocateDirect() (see JDirectArray::allocateDirectBB() in the branch). Have the native code write into the ByteBuffer if at all possible. Otherwise, you'll need to pass a deleter from native to Java somehow.

(FWIW, I saw that Android and OpenJDK implement direct ByteBuffers a bit differently, hence my note about portability. Since Google appears to be adopting OpenJDK for Android, this may be less of an issue going forward).

dropbox / djinni

A buffer data type #54

GC-Unmanaged Access

GC-Managed Access

Moving and Sharing ownership to GC-Unmanaged Memory

Java => C++

C++ => Java