Closed leventov closed 1 year ago
@leventov
WRT memory allocate() makes no sense. I know, it is a temporary place holder for a region, which I will address next. There are several other silly constructs in the skeleton that need to be ironed out.
WRT DataInput / DataOutput. I didn't follow that discussion very closely. So perhaps you could restate the objective and the benefit of overloading that on top of what we are trying to do now. Nonetheless, I am very skeptical that Memory could be made stateless, without compromising other objectives like achieving performance of Unsafe with a safer and friendlier interface, and effectively replacing ByteBuffer with a much richer API.
@leerho
WRT DataInput / DataOutput. I didn't follow that discussion very closely. So perhaps you could restate the objective and the benefit of overloading that on top of what we are trying to do now. Nonetheless, I am very skeptical that Memory could be made stateless, without compromising other objectives like achieving performance of Unsafe with a safer and friendlier interface, and effectively replacing ByteBuffer with a much richer API.
I meant to make Memory
stateless (hence remove BaseMemory
), but not the package-private implementation classes, essentially push fields from BaseMemory
down to the private classes. I don't think it should affect performance at all.
The idea is that a third-party could provide a Memory
implementation based e. g. on distributed FS.
@leventov
BaseMemory is a "thought abstraction" at the moment and may migrate down the hierarchy anyway. But we can do that optimization later. Right now, I want to get the basic skeleton machinery of R/W Regions, ByteBuffers and MMF working and start some characterization tests to see where we stand on performance. If it is as simple as you say, then I'm not going to worry about it right now.
However, something that we should consider at this early stage is whether we will need thread concurrency. And instead of wrapping everything with Synchronized
after we are all done, which is rather inefficient, we should think about using low-level read/write locks, or perhaps optimistic locking.
@leerho
However, something that we should consider at this early stage is whether we will need thread concurrency. And instead of wrapping everything with Synchronized after we are all done, which is rather inefficient, we should think about using low-level read/write locks, or perhaps optimistic locking.
Because of this and generally, I think it's time to try to apply this API, in a manner similar to #3915
@leventov
Hmm. The only way I see to do that is to make M and WM abstract and create the concrete impls with the Unsafe calls a layer below. The BM functionality must be at the same level (or higher) then where the Unsafe calls occur. Not sure how that impacts performance.
@leventov
Could use your thoughts with a conundrum using Cleaner. Cleaner is directly tied to an object while finalize() is indirect in that the JVM keeps track of those objects that have a finalize method.
Unsafe is stateful with native address (in various forms) and an optional object reference.
1) Create a parent class that refers to a native region in memory using Unsafe with these state values and register it with the JVM Cleaner.
2) Create a child "region" class that needs to refer to that same region of memory as a "view". It has to copy or refer to those same state variables (perhaps modified by an offset) in order to use Unsafe.
3) The parent's cleaner.clean() is called and the parent's references to native memory are all cleaned up and the memory is freed from the OS.
How do we clean the child? If the users still holds a reference to the child and tries to do any operation with Unsafe against the state variables it knows about, it will result in a seg-fault. Even if the nativeBaseAddress is set to zero!
Option A) If the parent has a Cleaner, then the children have to register their own cleaner as well. But it could create a race condition if the Parent cleaner is called preemptively by the user.
Option B) Similar to A, but we don't allow the user to ever call Cleaner preemptively. Once the GC starts all the user threads are disabled so we couldn't have the race condition.
If B is the way to go I have concern that we may have only a few parent objects, but have many thousands of child views of that parent. Will that overwhelm the Cleaner's linked list?
@leventov
Option C) I should've thought of this. The child has to hold a reference to the Parent, AND we can't allow preemptive cleans. Only when all the children and the parent are dereferenced will the GC clean it all up.
@leerho my suggestion is
Cleaner
per allocation. When views, subregions, positioned etc. are created, they have a reference to the "parent", but they shouldn't delegate their getXxx/putXxx methods to the parent, they copy address to their own fields, for faster access. They don't create their own Cleaners.close()
is allowed.Cleaner
and parent both reference AtomicBoolean
"cleaned", which is set to true
from inside Cleaner
.checkIndex()
, this flag is checked, and exception "use after free" is thrown, if already set to true.@leventov thanks, that is where I was headed. I wasn't so sure about the manual clean but if all of this is on a single thread it makes sense. We would have to think harder if this activity was across multiple threads.
@leerho @leventov IMO I think the cleaned flag isn't a great idea, its better not to have explicit close. Only NM registers cleaner, MR does nothing, NM clean will only get called when there is no one holding on the reference. Having a cleaned flag will mean that we have to keep doing assert checks in checkIndex() or equivalent check somewhere else
@niketh
Having a cleaned flag will mean that we have to keep doing assert checks in checkIndex() or equivalent check somewhere else
Yes. What's the problem with that? From performance perspective, it's free, JIT will remove that code entirely.
better not to have explicit close.
Nobody obligates you to use explicit close()
, but sometimes it is handy and obviously safe:
try (Memory mem = Memory.map(file)) {
read something from file...
}
Because ByteBuffer
doesn't implement AutoCloseable
, we had to create a wrapper: MappedByteBufferHandler
. Let's not repeat this mistake with Memory
.
@leventov
I have implemented a skeleton with two different schemas:
Notes:
AllocateMap
has not been incorporated yet, WritableMemory2
. With minor changes I can convert it to Version B using WritableMemory
.Version A:
Memory
(read only) is abstract, with no class varsWritableMemory
is concrete, but NO duplication of code is requiredVersion A
public abstract class Memory (no class vars)
- public static RO methods: wrapping arrays, ByteBuffers, Maps
- public abstract RO methods: get<primitive>, get<primitive>Array. region
class MemoryImpl extends Memory
- Concrete RO impls of Memory
public class WritableMemory extends MemoryImpl
- public static W methods: wrapping arrays, ByteBuffer, Map, Direct alloc, autoByteArray
- public W methods: put<primitive>, put<primitive>Array, getMemoryRequest(), freeMemory()
class AllocateDirect extends WritableMemory implements AutoClosable
- Allocates direct memory, uses Cleaner
class AllocateMap extends WritableMemory implements AutoClosable
- Allocates direct memory, uses Cleaner (for Map)
Version B:
Memory
is abstract with no class varsWritableMemory
is abstract with no class vars,
However, the WritableMemoryImpl
concrete class requires duplication of all the MemoryImpl
codeVersion B
public abstract class Memory (no class vars)
- static RO methods: wrapping arrays, ByteBuffers, Maps
- abstract RO methods: get<primitive>, get<primitive>Array. region
class MemoryImpl extends Memory ()
- Concrete RO impls of Memory
public abstract class WritableMemory extends Memory (no class vars)
- static W methods: wrapping arrays, ByteBuffers, Maps, Direct alloc, autoByteArrays
- abstract W methods: put<primitive>, put<primitive>Array, getMemoryRequest(), freeMemory()
class WritableMemoryImpl extends WritableMemory
- Concrete RO impls of Memory (DUPLICATE CODE)
- Concrete W impls of WritableMemory
class AllocateDirect extends WritableMemoryImpl implements AutoClosable
- Allocates direct memory, uses Cleaner
class AllocateMap extends WritableMemoryImpl implements AutoClosable
- Allocates direct memory, uses Cleaner (for Map)
I understand you wanted no state at the top of the hierarchy, but it is still not clear to me why. any instantiation will have state.
I prefer Version A, because no code duplication is required and fewer classes. If we allowed class vars at the root we could eliminate another class.
Comments?
@leventov
I have added the AtomicBoolean
into the Dealocator
for Cleaner
and added the asserts for the current get/put methods for the Version A. Also added some tests where the Parent is freed, and the child region detects that the region is no longer valid via assert. Seems to work.
I think I will move to a Version C, similar to what you suggested earlier, with only one writable implementation, and use asserts to detect writable and JDK7 vs 8, etc. Maintaining this R/W separation is a PITA.
@leerho VersionA + Boolean flag to check if the parent has been freed looks good. What is approach C ?
It is in the parallel package memory3.
@leventov @niketh
See package-info
Version C is the furthest along and I have implemented the mechanisms for
Not yet Done:
I have more than enough so that I can start some characterization tests and see how it performs against the current Memory.
Please review and comment, as the next stages will be a lot of work and I'd like to get feedback now.
@leerho sorry for late review. I like version B most. You don't have to duplicate code, because you don't need MemoryImpl
, you can return WritableMemoryImpl
from static factory methods which return Memory
(read-only) as well. And read-only flag also won't be needed in this case, because Java compiler will ensure there are no illegal accesses, if we follow two simple rules: don't explicitly cast any Memory
instance to WritableMemory
, and don't use Memory
as generic parameter.
I believe it's important to keep read-only and writable interfaces separate, I explained above why: https://github.com/druid-io/druid/issues/3892#issuecomment-279173943
@leerho Memory
asked to be stateless and implementation-less to allow implementation based on completely different principles than Unsafe, e. g. something backed by network-attached storage: https://github.com/druid-io/druid/issues/3892#issuecomment-279304755
However it is not necessary now, because there is no reaction from @weijietong and probably he doesn't need #3716 anymore. Anyway, it could be fixed relatively easily later, if needed, because it doesn't incur API changes.
@leventov
Sorry, I had completely misinterpreted a previous statement of yours:
regarding x2 less classes, I meant that the separation between read-only and writable implementations is not needed. There are only writable implementations. The only thing that e. g. map() does is delegation to mapWritable(), and return the read-only interface. It is safe on the type-system level with very little precautions. Your approach with runtime separation provides extra safety, if you want to keep this safety - yes, you either still have x2 classes, or e. g. have a boolean flag "writable" and check it guarded by assertion, e. g. in checkOffset() method, along with the offset check.
Now that I read it again carefully and in the context of your latest comments, I understand what happened. I interpreted "implementations" as "classes that have been implemented", assuming the looser interpretation of "implementations". What you meant by "implementations" is "concrete classes" as opposed to an "abstract classes". And since in the same paragraph, you suggested having a boolean flag, the looser interpretation of "implementations" directly leads to Version C!
I will go back to Version B, but with ONE concrete WritableMemory class and no Memory concrete class :)
In your statement:
if we ... don't explicitly cast any
Memory
instance toWritableMemory
...
Who are "we"? You and I certainly won't do that. But what is there to prevent a user from doing that? :)
And to make sure I understand what you mean, could you give me a code example of what you mean by:
and don't use
Memory
as generic parameter.
Thank you for your clarification on the need for the stateless root. I may as well keep the root stateless.
I have completed some initial characterization tests on the performance of this new hierarchy based Version C. (I don't expect the different versions to vary much since they should all collapse down to a single object after compilation and JIT). It is a little better than the current Memory implementation based on an interface
, but still not as good as pure static calls, which will still be possible if a WritableMemory
object is passed and the user goes to the trouble.
Current Memory Performance
Version C Memory Performance
Memory Performance using Static Methods Wrapping Unsafe
Who are "we"? You and I certainly won't do that. But what is there to prevent a user from doing that? :)
DataSketches contributors and Druid contributors. It could be enforced even by adding a custom "Regexp" rule to checkstyle config, that looks for (WritableMemory)
and Memory>
@leventov
I think you are referring to Java Generics
instead of the linguistic definition of "generic" meaning "nonexclusive", "wide", or "universal" i.e., don't do:
class foo<Memory> ... { //do WritableMemory stuff }
This is essentially another form of bad casting.
@leventov
I'm concerned that the rule don't explicitly cast any Memory
instance to WritableMemory
will be very difficult to enforce. Not all users of DataSketches and/or Druid will necessarily be contributors, or participate in our forums, or use our checkstyle configurations.
So I'd like to keep the boolean readOnly
as a backup check, in the hopes that at some point they at least run their code with assertions enabled. There is no cost at run-time.
@leerho
So I'd like to keep the boolean readOnly as a backup check, in the hopes that at some point they at least run their code with assertions enabled. There is no cost at run-time.
Ok.
One thing that wasn't discussed and designed yet, is how do we support big-endian writes and reads, to support legacy formats.
Options:
putXxxBigEndian(offset, v)/getXxxBigEndian(offset)
or putXxx(offset, v, ByteOrder order)
methods in the main interface. (This approach is taken in Argona's buffers)mapFile(f, ByteOrder order)
, or a conversion method Memory.asBigEndian()
ByteBuffer
to be able to read and write legacy formats.@leventov
Supporting BE/LE is messy. We need to consider dual-mode cases: e.g., reading TCP/IP packets where the protocol fields are BE, and the payload could be LE. Not that I want to write packet handlers, but I could believe that careless use of ByteBuffer (without always specifying endianness) could lead to horrible mixed endianness data!
I also don't want to encourage the propagation of BE data when it is clearly unnecessary and a significant performance hit.
Explicit: putXBigEndian(offset, v)
... would double the already large number of methods, and becomes a slippery slope WRT how we implement the other variants (positional). I'm not really keen on this one.
Per call: putX(offset, v, ByteOrder order)
incurs a cost with every put/get whether BE or LE. So I'm definitely opposed to this unless it is a totally separate implementation.
Totally separate implementation: All put/get methods would be of the form putX(offset, v, ByteOrder order)
. This has the advantage of covering the horrible mixed endianness cases. The specific factory mapFile(f, offset, capacity)
would not have to specify endianness as this is an option on every put/get
that the mapping class extends. However, wrapping ByteBuffer, with its internal endianness state, properly (and maintain sanity) may take a little more thought. E.g., do we keep the endianness state of BB in our state and make it visible? How easy would it be for a developer to accidentally end up with double byte-swaps?
A separate implementations is a lot of work, and, IMHO, provides little incentive for developers to fix the fundamental underlying problem. Keeping these separate implementations in sync with changes to the basic implementation could be a maintenance headache.
No support: I kinda like this one as it would really encourage developers to go back and regenerate their data using LE (if at all possible) or at least somehow mark the data as "deprecated" to be eventually replaced with the LE form. One option would require developers to explicitly call byte swappers for every put/get call. We could provide efficient byte swapper methods (only 3), but that would be the extent of it.
Separate Library Another possibility is a totally separate package, possibly in the Druid domain, that implements these variants (which, at this point, could be based on true interfaces) and underneath calls or extends the Memory package, which will always be Native-Endianness and non-positional.
Whatever we recommend here needs to be in context with however we decide to support other variants (e.g., positional put/gets, thread-safe?, ...). Is the positional approach also a totally separate implementation? Or should we ask developers to write their own positional logic that wraps the calls they need? (We could provide example code on how to do that.)
With all the potential separate implementation variants (basic, endianness, positional, thread-safe, etc), we are talking about a huge amount of code, and way beyond what I originally envisioned as a small, fast library that simply extends Unsafe. And, BTW, NONE of these other implementations will have the speed of the basic implementation (and where I start losing interest).
@leventov
BTW, so far, these other variants will not be able to leverage the stateless base classes Memory and WritableMemory that we have in the current design. This calls into question the value of these stateless classes and makes them more like bloat. I am seriously considering removing one or both of them.
@leerho
`putXBigEndian(offset, v)
Agree, it's bad. Also it looks more noisy and makes easy to make a mistake, forgetting "BigEndian" suffix somewhere.
putX(offset, v, ByteOrder order)
The same, and also noisy and even simpler to make (and harder to spot) mistakes, forgetting the third argument.
Totally separate implementation: All put/get methods would be of the form putX(offset, v, ByteOrder order).
Didn't understand how it's different from the previous.
This has the advantage of covering the horrible mixed endianness cases.
There shouldn't be such cases, at least I can speak for Druid. Either we need to support legacy file/network/cache-key formats which are big-endian only, or use native endian for new formats, and for non-persistent computational buffers, which live only within the JVM.
A separate implementations is a lot of work, and, IMHO, provides little incentive for developers to fix the fundamental underlying problem. Keeping these separate implementations in sync with changes to the basic implementation could be a maintenance headache.
Let's keep in mind very clearly, that we develop a library not for abstract "developers", we develop it for ourselves. Big-endian poisoned the Druid codebase for two reasons:
order(ByteOrder.nativeOrder())
is very verbose.asReadOnlyBuffer()
, slice()
, duplicate()
return big-endian buffer regardless of the order of the parent, shame.By merely making Memory's static factory methods to return native-ordered Memory
by default, and ensure order inheritance on transformations, we fix the problem for all new code/new formats. On the other hand, Druid has to support old big-endian formats forever, so it's pointless to make it "painful" to use big-endian in order to "provide incentive" for ourselves to "move" to native order. It's just making our own live harder for nothing.
My suggestion
We can make a "NonNativeWritableMemoryImpl" class extending WritableMemoryImpl
, and overriding 12 methods (6 getXxx and 6 putXxx) with delegation to super
and swapping bytes. It's a burden, but IMO it's not that bad to maintain those 12 methods + 12 similar methods for Positioned. Also keep in mind that Memory is supposed to be changed very rarely, after initially developed, unlike Druid/Datasketches code.
The advantage or this approach is that the implementation code could be fully shared for legacy formats (big-endian) and new formats (native-ordered), if nothing is changed except the byte order. If the matter of providing Memory
with different byte order as the input for format reading/ output for format writing.
Factories: Memory.map(file)
has counterpart, which accepts ByteOrder
as an additional parameter: Memory.map(file, ByteOrder order)
. Factory code internally decides what to return: WritableMemoryImpl
or "NonNativeWritableMemoryImpl", depending on the native platform byte order and the given byte order.
@leerho
Thread-safe Memory is not needed.
PositionedMemory could be made pretty fast IMO, if instead of wrapping Memory
and keeping a separate "position" field, it's made very similar to WritableMemoryImpl
, with the difference that cumulativeOffset
is not a final
field, and updated on each operation.
BTW, so far, these other variants will not be able to leverage the stateless base classes Memory and WritableMemory that we have in the current design. This calls into question the value of these stateless classes and makes them more like bloat. I am seriously considering removing one or both of them.
I believe in stateless and separation of stateless hierarchy (Memory) and Positioned. Let's made initial API with them separate, and if in the process/after the migration of Druid and Datasketches code it is be proven that separation is not needed, they could be merged. This extra refactoring pass, based on the actual migration experience will be needed anyway.
@leventov
12 methods (6 getXxx and 6 putXxx)
Lets count just the ones impacted by byte-order: 24 primitive puts/gets + 6 atomic = 30:
getChar, getCharArray, getShort, getShortArray, getInt, getIntArray, getLong, getLongArray,
getFloat, getFloatArray, getDouble, getDoubleArray, putChar, putCharArray, putShort,
putShortArray, putInt, putIntArray, putLong, putLongArray, putFloat, putFloatArray, putDouble,
putDoubleArray, getAndAddInt, getAndAddLong, compareAndSwapInt, compareAndSwapLong,
getAndSetInt, getAndSetLong
Thread-safe Memory is not needed.
I'm not convinced. Let's look at Druid's SynchronizedUnion:
This came about in a real-time query application where data was continuously being ingested into sketches in Druid and the same sketches were being accessed by multiple query threads. In order to solve the concurrency problem quickly, the entire Union object was wrapped with a brute-force synchronized
approach on all the methods, which is unnecessarily slow. What we actually could use in the Druid case is a single-writer, multiple reader read/write locks, or even better, optimistic locking. I have some folks investigating this now. In order to do this with off-heap data structures the Java locking mechanisms will not work. We will at least require the use of the unsafe atomic methods mentioned above and likely modification of the put/get calls as well. I don't have the full answer on this yet.
@leerho ok, I was wrong. 30 + 30 = 60 similar methods is a big burden, but I still think it's feasible to write them once, with almost no attention required later. Also many methods may delegate to each other on WritableMemoryImpl
level: float
methods delegate to int
methods, double
to long
, char
to short
or vice-versa. Such methods don't need to overridden in "NonNativeWritableMemoryImpl".
This came about in a real-time query application where data was continuously being ingested into sketches in Druid and the same sketches were being accessed by multiple query threads. In order to solve the concurrency problem quickly, the entire Union object was wrapped with a brute-force synchronized approach on all the methods, which is unnecessarily slow. What we actually could use in the Druid case is a single-writer, multiple reader read/write locks, or even better, optimistic locking. I have some folks investigating this now. In order to do this with off-heap data structures the Java locking mechanisms will not work. We will at least require the use of the unsafe atomic methods mentioned above and likely modification of the put/get calls as well. I don't have the full answer on this yet.
Either int or long memory "fields" + compareAndSwap operations on them + may need to add getIntVolatile, putIntVolatile, putIntOrdered, getLongVolatile, putLongVolatile, putLongOrdered
could be used for implementing lightweight spin locks. Or some synchronization mechanisms used, external to Memory
objects. "ThreadSafeMemory" is not a solution.
@leventov
I think we are close to agreement. I haven't added the volatile and ordered methods yet until I see what our whole approach for the concurrent sketches will actually require.
@leventov
I have a a design issue perhaps you could comment on:
A ByteBuffer
has an internal ReadOnly / Writable state (as does a file). Because we now have only one implementation, WritableMemoryImpl
, the internal helper class AccessByteBuffer
must return a WritableMemoryImpl
. Similarly, the AccessDirectMap
also returns a WritableMemoryImpl
.
Close():
I do not provide close() in Memory
, it is only available in WritableMemory
. I don't want downstream read-only users to inadvertently close the original BB or Map object.
This means that I can't provide in Memory
a public static Memory wrap(ByteBuffer)
or a public static Memory map(file ...)
because ultimately these objects need to be closed so that the atomic valid flag is set to false and propagates to any child regions.
So the only way to create a RO Memory
instance of a BB (or map) is via asReadOnly()
from a WritableMemory
instance. This also means that I cannot throw an error during construction if the user attempts to create a WritableMemory
of a RO BB or Map.
And, the only way during runtime to detect a write against a WritableMemory
instance of a RO BB is when asserts are enabled. Can we live with this?
Maps are safer in that if the file is RO, the OS will throw an error on an attempted write.
Any better ideas?
@leerho
Memory
should also be closeable, because it's needed for a common pattern:
try (Memory mem = Memory.map(file)) {
.. read something from file
}
I didn't understand it clearly but probably the rest of the problem will go away also.
Downstream read-only users can inadvertently close Memory: I agree it's not great, but currently I think it's a required trade-off (maybe my mind will change when we do actual Druid code migration and see how it works). As I noted here: https://github.com/druid-io/druid/issues/3892#issuecomment-280742485, Memory.close()
should be rarely used. Maybe we can apply policy that any Memory
object is closed only in the case class as where it's created, if a class accepts Memory
as a parameter (downstream user), it must not close it.
@leventov
The pattern I am worried about is:
WritableMemory wmem = WritableMemory.writableWrap(ByteBuffer writableBB);
Memory mem = wmem.asReadOnly();
// pass mem as an argument to 1000 different downstream clients that need read access ...
// ...if any one of them accidentally calls close() you are hosed!
wmem.close(); //The owner of the original object should be the only one closing it.
I don't understand what your "tradeoff" concern is. Since we only have one way to instantiate a BB we could make it less verbose: WritableMemory wmem = WritableMemory.wrap(BB)
. Or perhaps we could make the name more concise: WriteMemory
or Wmemory
. So we would have:
WMemory wmem = WMemory.wrap(ByteBuffer writableBB);
Memory mem = wmem.asReadOnly();
I would recommend that we start with not allowing Memory.close(). Memory.close may be rare in Druid, but I am worried about other environments including Yahoo!
Ok, let's try something like MemoryHandle extends AutoCloseable
, with Memory get()
method. Memory.map()
and allocateDirect
return MemoryHandle
, other static factory methods return just Memory
. And Memory
with WritableMemory
are not AutoCloseable.
Then the pattern above is
try (MemoryHandle mh = Memory.map(file)) {
Memory mem = mh.get();
.. read from file
}
Memory mem = wmem.asReadOnly();
Shouldn't need asReadOnly()
, just upcast: Memory mem = wmem;
. And actually this line not needed, just pass wmem
to methods that accept Memory
.
WriteMemory, Wmemory
I like natural English WritableMemory
most
@leventov
Does your MemoryHandle apply to BB? Could you give me a skeleton of what MH extends and includes?
I need the asReadOnly to set the readOnly flag, which an upcast will not do. The RO flag is the only backup to detect a improper cast of (WritableMemory) memory.
Does your MemoryHandle apply to BB?
If you mean "should Memory.wrap(BB)
return MemoryHandle" I think no, because somebody who allocated BB should care about closing it.
MemoryHandle
should be similar to MappedByteBufferHandler
. Since Memory still want to be able to catch use-after-free, Cleaner
should still be private to Memory
(WritableMemoryImpl
). The only difference is that Memory.close()
made package-private, and only MemoryHandleImpl
can access it. (MemoryHandle
itself is an interface)
I need the asReadOnly to set the readOnly flag, which an upcast will not do. The RO flag is the only backup to detect a improper cast of (WritableMemory) memory.
You shouldn't be able to obtain WritableMemory
which is not actually writable, in the first place :)
If asReadOnly()
just makes a check and returns this
, it's ok
@leventov
Something like:
public final class MemoryHandler implements AutoCloseable {
private final WritableMemory wmem;
MemoryHandler( WritableMemoryImpl wmem) {
this.wmem = wmem;
}
public Memory get() { return wmem.asReadOnly(); } //sets the RO flag
public WritableMemory getWritable() { return wmem; }
//calls the package private close() method of wmem
public void close() { wmem.close(); }
}
AccessByteBuffer
, AllocateDirectMap
and AllocateDirect
all extend WritableMemoryImpl
and return MH. So do all the static methods that create heap array backed Memory
or WritableMemory
.
Now the only way to close the retained WritableMemoryImpl
is through MH and there can be only one owner of an MH. This also means MH is the only class that the compiler detects that requires a close() or should be implemented with try-with-resource.
The rest of the hierarchy is the same.
@leerho
I forgot that we have Memory and WritableMemory, in this case I would prefer to have two separate interfaces MemoryHandle
and WritableMemoryHandle extends MemoryHandle
, both with a single get()
method, because getWritable()
allows Memory.map(file).getWritable()
, that shouldn't be allowed.
AccessByteBuffer, AllocateDirectMap and AllocateDirect all extend WritableMemoryImpl
I don't understand why it is needed to have subclasses with only specialized constructors. It will make difficult to implement NonNativeWritableMemoryImpl
. I suggest to implement AccessByteBuffer's, AllocateDirectMap's and AllocateDirect's logic in static factory methods, then passing all prepared arguments to WritableMemoryImpl
or NonNativeWritableMemoryImpl
constructors.
Also I want to return to asReadOnly()
, I don't actually understand what exactly it will check? It should be possible to upcast WritableMemory
to Memory
, because during query processing some buffers first act as writable, and then as read-only.
@leventov
What is making things tricky is that we only have one implementation, WritableMemoryImpl
, which allows problematic casts. I am beginning to feel that splitting into two impls would be a lot cleaner, safer, and just not worry about the 30+ duplicated methods. I have already organized the code in the WritableMemoryImpl
class so that all the RO methods are together, so a simple copy & paste should be straightforward.
With two impls, asReadOnly
creates a new class, and going from a Memory to WritableMemory would never be possible.
One impl: The readOnly flag is only checked on the write methods with an assert. So if you cast a Memory object to a WritableMemory the readOnly flag will be true and not change and will throw an assertion error (with -ea).
As I explained before, ByteBuffers and Maps have their own internal RO/W state. And because we have only one writable impl, we can end up with an RO map/BB wrapped by the writable impl. This creates the need for the readOnly flag and the assert check of W methods.
If you do Memory mem = (WritableMemory)writableMem, the readOnly flag remains false (writable). the Memory API prevents writes, but then this also allows a downstream client to do the reverse cast, since the underlying impl is the same. I don't see any easy solution to this other than having two impls.
I do like the MemoryHandler idea as it isolates only those objects that actually require AutoCloseable
from those that do not.
@leventov
I don't understand why it is needed to have subclasses with only specialized constructors. It will make difficult to implement NonNativeWritableMemoryImpl.
AllocateDirect and AllocateDirectMap both require Cleaner as a field. The Cleaners have different implementations. These two need to be a subclass so that a user call of close() at the root of the hierarchy does the work specified in the appropriate subclass.
Also, the WritableMemoryImpl class is already huge, so I was looking for opportunities to split out well defined subsets of code.
AccessByteBuffer has a lot of specialized logic so it made sense to make it a separate class.
Honestly, I haven't thought much about how to extend all this for BigEndian, or Positional either. I am just trying to get the basic functioning done with a workable public API.
@leventov
Using an intermediate interface we can leverage AllocateDirect, AllocateDirectMap and AccessByteBuffer across multiple implementations.
Since positional will need to leverage both NE and BE, it should be a true interface.
What is making things tricky is that we only have one implementation, WritableMemoryImpl, which allows problematic casts. I am beginning to feel that splitting into two impls would be a lot cleaner, safer, and just not worry about the 30+ duplicated methods. I have already organized the code in the WritableMemoryImpl class so that all the RO methods are together, so a simple copy & paste should be straightforward.
Personally I don't care that much about code duplication, but keep in mind "NonNative" subclasses, they should also be duplicated then.
With two impls, asReadOnly creates a new class, and going from a Memory to WritableMemory would never be possible.
This is actually what I want to avoid, see the very first message in this thread, "Goals" section, "Don't make a lot of Object copies when a Memory object has to be "sliced", "converted to read-only", etc.". See https://github.com/druid-io/druid/pull/3716#issuecomment-274658045. Currently Druid creates millions of unnecessary read-only ByteBuffers, "for safety". But I think mere separation of interfaces, Memory and WritableMemory, provides enough safety.
the Memory API prevents writes, but then this also allows a downstream client to do the reverse cast, since the underlying impl is the same. I don't see any easy solution to this other than having two impls.
We are creating library for ourselves, not for "dumb users who will do the wrong thing if they could". There is a solution: don't do reverse casts! I'm pretty sure nobody will ever try to do reverse casts, but then it is catched during code review, and even could be enforced using checkstyle rules, as suggested above: https://github.com/druid-io/druid/issues/3892#issuecomment-279173943
AllocateDirect and AllocateDirectMap both require Cleaner as a field. The Cleaners have different implementations. These two need to be a subclass so that a user call of close() at the root of the hierarchy does the work specified in the appropriate subclass.
My intention was that there is a Cleaner
field in WritableMemoryImpl, and it's null when cleaning is not needed (e. g. heap allocation). And close() method contains something like
if (cleaner != null) {
cleaner.clean();
}
Also, the WritableMemoryImpl class is already huge, so I was looking for opportunities to split out well defined subsets of code.
The problem is requirement for "NonNative" subclass, that will multiply the amount of code, if there is more than one "WritableMemoryImpl"
Using an intermediate interface we can leverage AllocateDirect, AllocateDirectMap and AccessByteBuffer across multiple implementations.
Don't understand this
Since positional will need to leverage both NE and BE, it should be a true interface.
The only reason why Memory
and WritableMemory
are not true interfaces, is ability to have static factory methods. My idea was to create Positional from existing Memory or WritableMemory objects, using region(start, limit)
method. So Positional itself doesn't have static factory methods and could be a true interface.
@leventov
With two impls, asReadOnly creates a new class, and going from a Memory to WritableMemory would never be possible.
This is actually what I want to avoid, see the very first message in this thread, "Goals" section, "Don't make a lot of Object copies when a Memory object has to be "sliced", "converted to read-only", etc.".
I'm trying to understand exactly what you are trying to avoid. When I say "creates a new class", what I mean is a new wrapper pointing to the same resource. Only a few pointer registers get created. With a single impl asReadOnly
can be done with an up-cast. But slices
will always require a new wrapper object. Also viewing a resource as positional will require creating a different wrapper object pointing to the same resource. (It is also possible to create two views of the same resource using different endianness just using different wrappers).
What is the use pattern that you think is creating all these "unnecessary objects"?
Are the vast majority of ByteBuffer objects created to protect the parent's position and limit state from being changed by the child and to protect the children from affecting each other? If this is the case then:
WritableMemory wMem = //original resource
Memory mem = wMem.asReadOnly() //created only once
//pass mem to many children
ChildProcess(mem, offset, length) {
//Each child can either read directly using offsets, or
Buffer buf = mem.asBuffer(). //I am suggesting that "Buffer" = "PositionalMemory"
buf.setPosition(offset);
buf.setLimit(offset + length);
//yes, this creates another wrapper, but may be necessary depending on what the child is doing
....
}
Even if we have both R and W impls, the asReadOnly wrapper is created only once. This should dramatically reduce the number of distinct wrapper objects. With ByteBuffer, the children have to be protected from each other so a new wrapper is required for each child.
@leerho
I'm trying to understand exactly what you are trying to avoid. When I say "creates a new class", what I mean is a new wrapper pointing to the same resource. Only a few pointer registers get created. With a single impl asReadOnly can be done with an up-cast. But slices will always require a new wrapper object. Also viewing a resource as positional will require creating a different wrapper object pointing to the same resource. (It is also possible to create two views of the same resource using different endianness just using different wrappers).
Currently in Druid, a lot of asReadOnlyBuffer()
are called on already read-only buffers, because there is no way to distinguish between read-only and writable on type level, and "asReadOnlyBuffer()" is called "for safety". Slicing should be partially replaced with passing primitive "position" (and, optionally, "size"/"limit") arguments along with Memory
objects. Currently slicing is also partially fostered by the fact that "positional" and "absolute" interfaces are mixed in ByteBuffer.
I'm afraid we are entering design paralysis phase, let's iterate specifically on points where we disagree. API first. How I currently see it:
abstract class Memory {
static WritableMemory allocate(long size);
static WritableMemoryHandler allocateDirect(long size);
static Memory wrap(ByteBuffer);
static WritableMemory wrapForWrite(ByteBuffer);
static MemoryHandler map(File);
static MemoryHandler map(File, long pos, long len);
static WritableMemoryHandler mapForWrite(File);
static WritableMemoryHandler mapForWrite(File, long pos, long len);
// no fields
long size();
ByteOrder order();
Memory withOrder(ByteOrder order);
Buffer buffer(long position, long limit);
.. getXxx(long pos) methods
}
abstract class WritableMemory extends Memory {
// no fields
WritableMemory withOrder(ByteOrder order);
WritableBuffer buffer(long position, long limit);
Memory asReadOnly();
.. putXxx(position) methods
}
interface Buffer {
.. getXxx() methods
}
interface WritableBuffer {
.. putXxx() methods
}
interface MemoryHandler extends AutoCloseable {
Memory get();
void close();
}
interface WritableMemoryHandler extends MemoryHandler {
WritableMemory get();
}
@leventov
Here is what is implemented so far and compiles (see memory4): Completing Buffer, WritableBuffer, asNonNativeEndian(), asNativeEndian() is pretty mechanical from here.
public abstract class Memory {
public static MemoryHandler wrap(final ByteBuffer byteBuf)
public static MemoryHandler map(final File file)
public static MemoryHandler map(final File file, final long fileOffset, final long capacity)
public abstract Memory region(long offsetBytes, long capacityBytes)
public static Memory wrap(final prim-type[] arr)
public abstract void copy(long srcOffsetBytes, WritableMemory destination, long dstOffsetBytes,
long lengthBytes)
public abstract getXXX(offset) methods
... plus other read misc
public abstract asNonNativeEndian() //not implemented
public abstract asNativeEndian() //not implemented
}
public abstract class WritableMemory {
public static WritableMemoryHandler wrap(final ByteBuffer byteBuf)
public static WritableMemoryHandler map(final File file)
public static WritableMemoryHandler map(final File file, final long fileOffset, final long capacity)
public static WritableMemoryHandler allocateDirect(final long capacityBytes)
public static WritableMemoryHandler allocateDirect(final long capacityBytes, final MemoryRequest memReq)
public abstract WritableMemory region(long offsetBytes, long capacityBytes)
public abstract Memory asReadOnly();
public static WritableMemory allocate(final int capacityBytes)
public static WritableMemory wrap(final prim-type[] arr)
public abstract void copy(long srcOffsetBytes, WritableMemory destination, long dstOffsetBytes,
long lengthBytes);
public abstract getXXX(offset) methods
... plus other read misc
public abstract void putXXX(long offsetBytes, prim-type value)
... plus other write misc
public abstract MemoryRequest getMemoryRequest()
public abstract asNonNativeEndian() //not implemented
public abstract asNativeEndian() //not implemented
}
public interface MemoryHandler extends AutoCloseable {
Memory get()
void close()
void load() //only for memory-mapped-files
boolean isLoaded() //only for memory-mapped-files
}
public interface WritableMemoryHandler extends AutoCloseable {
WritableMemory get()
void close()
void load() //only for memory-mapped-files
boolean isLoaded() //only for memory-mapped-files
void force() //only for memory-mapped-files
}
public abstract Buffer { //Not implemented
}
public abstract WritableBuffer { //Not implemented
}
@leerho
wrap(ByteBuffer)
returns Handler instead of just Memory - might complicate implementation, while not actually needed.wrap(ByteBuffer)
should inherit endianness from the buffer.asNonNativeEndian()
and asNativeEndian()
won't be convenient, in Druid we need to convert endianness when we know the specific endianness that we need, usually BIG_ENDIAN. So using this API, it would be something like ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN ? mem.asNativeEndian() : mem.asNonNativeEndian()
, instead of just mem.withOrder(ByteOrder.BIG_ENDIAN)
MemoryRequest
?load()
, isLoaded()
and force()
implementation requires JNI. And those methods are unused in Druid. I think they could be dropped.Everything else looks good to me.
@leventov
wrap(ByteBuffer) returns Handler instead of just Memory - might complicate implementation, while not actually needed.
I'm was being conservative here, this is easy to remove after we have discussed it. I was trying to capture the scenario where the owner of a BB passes a writable region to a client and then lets his own BB go out of scope thinking that he is done with it and no changes can be made. If the owner uses try-with-resources or close(), then the client's copy is immediately marked invalid even though the JVM won't collect it until the client's handle is released. This is a variant on the use-after-close scenario.
wrap(ByteBuffer) should inherit endianness from the buffer.
Good catch. Thank you.
asNonNativeEndian() and asNativeEndian()
Good point, I can do the detection of whether the chosen endianness is different from the native internally.
What for
MemoryRequest
?
This is for off-heap dynamic data structures that can possibly grow in size and where the process that is managing this data structure does not "own" the allocation of memory. This is a simple callback mechanism to allow the algorithm that is managing the data structure to request more memory if needed.
Sketches is an excellent example of this. Some sketches, like Theta Sketches
, have an upper bound in size. Others, like the quantiles sketches don't have an upper bound, but they grow very very slowly. They all start very small in size, e.g., a Theta Update Sketch
with a few entries can be a few hundred bytes, but can grow to a megabyte or so depending on how it is configured.
When you have millions of these sketches, the overall size adds up. Fortunately, our typical data is highly fragmented into many different dimensions and the resulting stream lengths are highly skewed with power-law distributions. This means that the vast majority of sketches only ever get a few entries, and the number of sketches that actually grow to full size is a tiny fraction. But which ones? If you allocate full space for all the sketches you would be wasting terabytes of RAM. The MemoryRequest
mechanism allows the systems engineer to allocate orders-of-magnitude less space for all the required sketches and then allocate more space only for those few sketches that do require it as needed.
So far, Druid has been managing its off-heap space using direct ByteBuffers
. But once you start moving away from ByteBuffers
and start allocating off-heap memory directly, you are essentially managing your own heap. I really don't want to go down the slippery-slope of creating our own malloc
, as that would be a significant undertaking. The MemoryRequest
is a simplistic mechanism that hopefully, will delay our having to design a actual malloc/free
system.
load(), isLoaded() and force() implementation requires JNI.
We are accessing these methods and their internal implementation using reflection calls to the underlying methods, which seem to be working. Using these calls can speed up access to mapped files considerably.
@leerho ok. Could you complete memory4 (buffers, endianness, a bunch of get/put methods), so we can start actual refactoring of Druid?
Also I think copy()
should be called get()
, similar to ByteBuffer
and to resolve ambiguity: copy to or copy from?
The goal of this issue is to agree on basic design principles and expectations from this refactoring.
Goals
ByteBuffer
always performs when making any read or write. Bound checks could be hoisted outside the hot loops manually or enabled via assertions/static final boolean flag, that is effectively free with some precautions.long
indexes. See #3743.Memory
object has to be "sliced", "converted to read-only", etc. See https://github.com/druid-io/druid/pull/3716#issuecomment-274658045.Design
It is moved to Druid source tree, not used as a dependency.See https://github.com/druid-io/druid/issues/3892#issuecomment-276589924Memory
object is immutable. "position and limit", if needed, are passed along withMemory
as two primitive longs.Memory
object has a cached immutable "view" object (which implements read methods and throws exception from write methods), this object is always returned when "as read only" is needed.close()
orfree()
onMemory
is possible, but not strictly required, there is asun.misc.Cleaner
safety net, as well as inDirectByteBuffer
.ByteBuffer
toMemory
is in progress (not expected to be done all at once, but subsystem by subsystem, class by classsee https://github.com/druid-io/druid/issues/3892#issuecomment-276548114), and also when we need to interop with external libraries which requireByteBuffer
, conversion fromMemory
toByteBuffer
and vice versa is possible. Likely it requires toDirectByteBuffer
-compatible format ofCleaner
insideMemory
DirectByteBuffer
constructors viaMagicAccessorImpl
.Memory
's bounds are checked optionally via assertions/guarded by static final boolean flag, "local" position and limit are check explicitly manually, with helper methods or versions of read and write methods of Memory, which accept read/write position and "local" limits. https://github.com/druid-io/druid/issues/3892#issuecomment-276185704Objections, additions, corrections, questions are welcome. @leerho @cheddar @weijietong @akashdw @himanshug @fjy @niketh