Open mknejp opened 9 years ago
For C++ -> Java:
If the Java method returns an nio.ByteBuffer
via NewDirectByteBuffer()
, the JVM won't free the underlying C++-allocated buffer when the nio.ByteBuffer
gets GC'ed. Djinni would probably need to provide a subclass of ByteBuffer or some sort of custom wrapper that calls free()
or a C++ deleter on finalize()
.
For Java -> C++:
Note sure if array_view
will get ratified... I would recommend a small custom structure similar to a capnproto / kj array: https://github.com/sandstorm-io/capnproto/blob/master/c%2B%2B/src/kj/array.h#L128 . Something like kj::Array
would be small, concise, and largely compatible with existing libc++
utilities.
For C++->Objective-C:
With respect to the issue of mutability, this looks like a case for an special djinni ObjC class (as in the C++ -> Java case where we need a ByteBuffer that will dispose on GC). If the special ObjC class is simply a pointer, size, and function pointer to a void disposer()
, then it should be largely interoperable with Core Foundation.
The underlying problem here is that ownership of the byte buffer must be transferred across the language border; a shared byte buffer is inherently a (pointer, size, disposer) tuple and not just (pointer, size). It might make more sense for djinni to include its own simple ByteBuffer for each language to achieve the necessary ownership transfer. While this additional data structure would increase complexity of djinni's UI, the existing binary
type is a solid solution for the majority of use cases where performance demands are relatively flexible.
One last thought for the interim: the user can pass pointers across the language boundary via the i64
type. For Java <-> C++, I know at least JNA has facilities for mapping pointer addresses.
Whether it's array_view
or something else doesn't matter if it can be configured just as the current setting for optional
, as long as it can be constructed from pointer/size arguments and takes a single template parameter.
I am not trying to make ownership or lifetime implicit. The user has to decide who owns the buffer. The user has to know that changing the size/capacity of the buffer by anyone may make the memory region invalid for all involved parties and has to be respecified. This is an attempt at providing either
std::vector
in the thunks.java.nio.ByteBuffer
or NSData
if the corresponding std::vector
no longer exists or was reallocated.I am not opposed to a solution with implicit lifetime management if it can be done properly and safely.
I guess a custom ObjC type derived rom NSData is acceptable since NSMutableData always copies the content in its initializers.
Another issue not yet discussed for Java <-> C++ is that Java ByteBuffers are by default Big Endian and ObjC/C++ users typically expects Little Endian data. This is mostly a user-facing issue and should be rare (and the buffer probably has an endian-aware r/w protocol 1 2). It might make sense to force native order, where necessary, as JNA does.
I also just noticed that djinni interface functions can accept interfaces as parameter types. (This feature demonstrated in the example code but not the root README example). While it would indeed be nice to have a djinni ByteBuffer
datatype, perhaps one simply needs a byte buffer interface (that might also include methods to address ownership transfer, if any)?
For example:
my_native_byte_buffer = interface +j +o {
allocate(size_bytes: i64); # Allocate space for this many bytes
begin(): i64; # Return address of first byte
size(): i64; # Return size of the buffer
disown(); # Release the buffer, but don't delete it;
# assume the user now owns the memory at
# begin() of size size()
}
my_buffer_writer = interface +j {
create(): my_native_byte_buffer;
}
my_buffer_reader = interface +c {
read(buffer: my_native_byte_buffer);
}
I note that unfortunately djinni won't compile the IDL if my_native_byte_buffer
is marked as having a +c
implementation (an assertion error triggers; not sure why).
In ObjC, getting a pointer address for begin()
shouldn't be too hard. In Java, one probably needs to call into JNI (derp!). For direct byte buffers, there's GetDirectBufferAddress
, and for byte arrays, there's GetByteArrayElements
, but that call might do a copy. FWIW JNA has a simple facility to get direct buffer addresses but not one for non-direct byte buffers (e.g. byte[]). My guess is JNA doesn't handle non-direct byte buffers because 1) the user has to release the jbyte*
so that the GC is free to e.g. move the byte[]
upon a compaction 2) the GetByteArrayElements()
might trigger a copy anyways, so the pointer address doesn't have much value.
While there are problems with this approach, it might be best for most users since it forces them to define how ownership works and to define what creates the pointer (e.g. mmapped file buffer? network buffer? non-direct buffers probably can't be shared even tho they can be ByteBuffer
-wrapped). Furthermore, use of large Java direct byte buffers might require special JVM args (e.g. -XX:MaxDirectMemorySize
) and special tuning so that the JVM leaves space for native heap.
Thinking on this point a bit more, there are a handful of tricky issues here:
GetPrimitiveArrayCritical()
requires no JNI calls until a ReleasePrimitiveArrayCritical()
, so the user must code intelligently to use this special feature. Otherwise buffers get deep-copied (so no better than current behavior of binary
).ByteBuffer
is direct, a separate call into C++ must happen to free the C++-allocated buffer. Either the user must put this call in their djinni interface (very undesirable) or Java must get a ByteBuffer
subclass that calls a C++ deleter on finalize()
(do-able).NSMutableData
responds to initWithBytesNoCopy
but will just deep-copy the buffer and delete it immediately. If NSData
owns a buffer, it must be allocated using malloc.djinni mainly offers two features:
records
are pass-by-value, always deep-copied, stateless, and marshaled between languagesinterfaces
are pass-by-reference, stateful, and are never marshaled between languagesWhat we really need is a union of these two features:
Perhaps a solution here is a feature for (expert) user-defined primitives:
support-lib
). Using this feature, we could do as much as provide our own ByteBuffer and as little as give C++ a (potentially) zero-copy view of a nio.ByteBuffer
.
Anybody put much thought into user-defined primitives before?
Anybody put much thought into user-defined primitives before?
It's something I'm working on to enable #52 but it needs some not-so-subtle changes to Djinni to be properly supported. It may also be helpful for #45 at some point.
Regarding the the type discussed here: Maybe you are trying to solve too many problems at once. What I envision (for starters) is a way to exchange a persistent area of memory to which both sides have read/write access. Someone has to create it and someone needs to be responsible for destroying it. I think the lifetime of such a heavyweight objet should be managed explicitly and depend on the use case.
What I envision (for starters) is a way to exchange a persistent area of memory to which both sides have read/write access.
For Java <-> C++, one would need to use a DirectByteBuffer
(since the JVM can copy heap buffers as it sees fit, and once it does a copy you might as well just use djinni's existing solution). The use case I have in mind is to give C++ direct r/w access to heap buffers (or direct buffers), and it appears some sort of (admittedly non-trivial) adapter class is necessary. I agree this latter use case is more complicated, but a lot of libraries have solved these JNI-related problems and it would be nice to distill those solutions into a djinni feature.
+1 to #52 as a solution to this issue!
Now that https://github.com/dropbox/djinni/pull/95 hit master, is this issue closed? Thanks @mknejp !!!
I suppose, unless such a type should be provided as part of Djinni's "standard library"
I do think it should be provided by Djinni (either fully built-in or by way of #95's mechanism, not sure yet), so let's leave this open for now.
My vote would be to have #95 fulfill the ByteBuffer issue; at least my intention is to use that mechanism. java.nio.ByteBuffer
isn't necessarily the best solution-- ByteBuffer
s still have a garbage collected component. A user could realistically want to leverage sun.misc.Unsafe
instead to minimize GC pauses.
@mknejp did you ever poke much farther on this? I'm curious if you ended up implementing anything for Java <=> C++.
I dug into this a bit further with the presumption that a buffer is a (pointer, size, deleter)
tuple. The deleter
addition specifically handles ownership-related issues for arena-allocated memory, mmap-ed files, and other buffers that need special cleanup. I believe this model covers all possible use cases.
Based upon the discussion below, I think a single zero-copy buffer record type would not fulfill all needs; I think most use cases actually call for an interface. Nevertheless, there appear to be some proper practices that could work there way into Djinni support code (if not as a part of an IDL primitive).
Note that if the user simply wants to share r/w access to a buffer and does not intend to move or share ownership, then a buffer can simply be a (pointer, size)
tuple. Unmanaged pointers are almost completely portable; if the JVM is 32-bit and the host is 64-bit, one needs to worry about sign extension. NB: java.nio.ByteBuffer
capacities are int
-sized, but sun.misc.Unsafe
allows allocating blocks of memory larger than Integer.MAX_VALUE
; thus each of (pointer, size)
should be 64-bit. Therefore, the simplest way to achieve a buffer type would be to define a record
with members i64 address
and i64 size
. If the user means to invoke an interface upon the buffer frequently (e.g. in a loop), it would be more performant (but slightly uglier) to omit the record and pass (i64 address, i64 size)
as parameters.
For exposing managed memory (e.g. byte[]
s) to native code, JNI's GetPrimitiveArrayCritical()
might work but can block GC (as it does in Hotspot). It seems that this API is really meant for immediately copying data to/from a device, e.g. as Android does in these results. Due to these restrictions, a (potentially) zero-copy managed buffer might warrant a completely unique object (record or interface) in user code.
In the case that the user does want to move and/or share ownership, a deleter
is necessary and significantly complicates the problem. Without loss of generality, we can assume a deleter
is either a Java Runnable
or a C++ callable (std::function<void()>
) that holds the buffer address (and/or other data/references) as state.
DirectByteBuffer
s offer some direction as to how have the JVM handle foreign deleters properly. Some JVMs leverage a somewhat peculiar Cleaner
utility to wrap a native free()
-calling thunk; the GC interops with cleaners directly. However, the Android JVM is a bit different and uses MemoryBlock#finalize()
to free native memory (i.e. there is no thunk). It's important to note that DirectByteBuffer
does not have a field for a deleter on all JVMs.
Thus for shared buffers, Djinni would need to internally maintain a list of deleters (or perhaps just the buffers themselves) and free memory only once buffers become unreferenced from both sides. (I believe there's similar existing code to deal with interface instances). For buffers that only move one way, the issue is still complicated (see below). It looks like a JNI call upon buffer termination is a necessity; moving buffers in a loop would have high overhead.
For DirectByteBuffer
s, Djinni needs to maintain a reference to the DirectByteBuffer
instance. Once the C++ proxy dies, Djinni can drop the DirectByteBuffer
reference and the GC will reclaim the memory normally.
For user-managed off-heap memory (e.g. memory allocated via sun.misc.Unsafe
), Djinni needs to invoke a user Runnable
once the C++ proxy dies. The user Runnable
might invoke Unsafe.freeMemory()
or might have other behavior (e.g. if the buffer resides within a larger arena). NB: Unsafe.freeMemory()
invokes a JVM-dependent os::free()
method, so user C++ code cannot safely just call free()
on Unsafe
-allocated memory.
Djinni would need to call a user std::function<void()>
cleanup method once the proxy object finalize()
is called.
FWIW, am hacking on the C++ <=> Java part of this in https://github.com/pwais/djinni/tree/pwais_perf2 (NOT in preview state yet) with the goal of supporting at least {byte[], (direct) ByteBuffer, Unsafe} <=> C++ using user-defined types. I think it would make sense to exist in /extension-libs
if included in djinni at all. The goal is not to add a core djinni buffer type (as I don't see a great solution in that approach) but to alleviate the user of having to worry about JNI through a few custom array types.
Any progress ?
Really want to finish the C++ <-> Java work I had started earlier (linked above), but have been inundated with other things. In that change, JHeapArrayHandle-inl.hpp
has all the bits for byte[]
<-> void , ByteArrayHandle-inl.hpp
for ByteBuffer <-> void , and JUnsafeArrayHandle-inl.hpp
for sun.misc.Unsafe
<-> void *. Some of the code is mid-refactor, so please pardon all the dust if you go digging, but I believe I had all the JNI calls correct.
If you just want direct ByteBuffer <-> (void *
, size_t
), here's what I'd recommend:
void *
, size_t
): Just use the GetDirectBufferAddress()
JNI APIvoid *
, size_t
) -> ByteBuffer: Note that ByteBuffer.allocateDirect()
is (AFAICT) the only portable way to create a Java-owned direct byte buffer; NewDirectByteBufer()
creates a native-owned fascade. I recommend using JNI to invoke ByteBuffer.allocateDirect()
(see JDirectArray::allocateDirectBB()
in the branch). Have the native code write into the ByteBuffer if at all possible. Otherwise, you'll need to pass a deleter from native to Java somehow. (FWIW, I saw that Android and OpenJDK implement direct ByteBuffers a bit differently, hence my note about portability. Since Google appears to be adopting OpenJDK for Android, this may be less of an issue going forward).
The current way the "data" type works is only marginally useful for data streaming between languages due to all the copying involved.
I was thinking about introducing a "buffer" data type that represents memory shared between both sides of the fence without allocating and copying stuff around in every call. Both Objective-C and Java have facilities to access "unmanaged" regions of memory.
java.nio.ByteBuffer
which can be created in JNI withNewDirectByteBuffer()
and does not copy the content.NSData dataWithBytesNoCopy:length:freeWhenDone:
same as above (**)java.nio.ByteBuffer
and useGetDirectBufferAddress()
andGetDirectBufferCapacity()
to transform into something likestd::experimental::array_view<uint8_t>
.NSMutableData
and use.length
and.mutableBytes
to construct the array view.The whole point of this exercise is to avoid copying the buffer content and is intended for long-lived buffers that are shared and written to/read from on both sides to exchange bulk data. There of course must be some sort of agreement in the interface protocol about who creates the data and make sure it is not modified in a way that invalidates the memory region.
On a related note, maybe the "data" datatype should also switch to something like
std::experimental::array_view<const uint8_t>
to avoid the copy at least in one direction where possible.\ The only drawback here is that
NSData
is read-only. If mutable access is necessary the buffer has to be created on the Objective-C side withNSMutableData
.