Move from ByteBuffer to Memory

leventov commented 7 years ago

The goal of this issue is to agree on basic design principles and expectations from this refactoring.

Goals

Speed up critical places in query processing by skipping bound checks, which ByteBuffer always performs when making any read or write. Bound checks could be hoisted outside the hot loops manually or enabled via assertions/static final boolean flag, that is effectively free with some precautions.
Get rid or 2 GB limitation, use long indexes. See #3743.
Don't make a lot of Object copies when a Memory object has to be "sliced", "converted to read-only", etc. See https://github.com/druid-io/druid/pull/3716#issuecomment-274658045.
+ Eliminate current Endian craziness in Druid by standardizing on native-endian. https://github.com/druid-io/druid/issues/3892#issuecomment-276235357

Design

Based on DataSketches's Memory.
~~It is moved to Druid source tree, not used as a dependency.~~ See https://github.com/druid-io/druid/issues/3892#issuecomment-276589924
Memory object is immutable. "position and limit", if needed, are passed along with Memory as two primitive longs.
Upon creation a mutable Memory object has a cached immutable "view" object (which implements read methods and throws exception from write methods), this object is always returned when "as read only" is needed.
Explicit close() or free() on Memory is possible, but not strictly required, there is a sun.misc.Cleaner safety net, as well as in DirectByteBuffer.
While transition from ByteBuffer to Memory is in progress (~~not expected to be done all at once, but subsystem by subsystem, class by class~~ see https://github.com/druid-io/druid/issues/3892#issuecomment-276548114), and also when we need to interop with external libraries which require ByteBuffer, conversion from Memory to ByteBuffer and vice versa is possible. Likely it requires to
- Use DirectByteBuffer-compatible format of Cleaner inside Memory
- Be able to access private DirectByteBuffer constructors via MagicAccessorImpl.
+ "Global" (immutable) Memory's bounds are checked optionally via assertions/guarded by static final boolean flag, "local" position and limit are check explicitly manually, with helper methods or versions of read and write methods of Memory, which accept read/write position and "local" limits. https://github.com/druid-io/druid/issues/3892#issuecomment-276185704

Objections, additions, corrections, questions are welcome. @leerho @cheddar @weijietong @akashdw @himanshug @fjy @niketh

leerho commented 7 years ago

@leventov

WRT memory allocate() makes no sense. I know, it is a temporary place holder for a region, which I will address next. There are several other silly constructs in the skeleton that need to be ironed out.
WRT DataInput / DataOutput. I didn't follow that discussion very closely. So perhaps you could restate the objective and the benefit of overloading that on top of what we are trying to do now. Nonetheless, I am very skeptical that Memory could be made stateless, without compromising other objectives like achieving performance of Unsafe with a safer and friendlier interface, and effectively replacing ByteBuffer with a much richer API.

leventov commented 7 years ago

@leerho

WRT DataInput / DataOutput. I didn't follow that discussion very closely. So perhaps you could restate the objective and the benefit of overloading that on top of what we are trying to do now. Nonetheless, I am very skeptical that Memory could be made stateless, without compromising other objectives like achieving performance of Unsafe with a safer and friendlier interface, and effectively replacing ByteBuffer with a much richer API.

I meant to make Memory stateless (hence remove BaseMemory), but not the package-private implementation classes, essentially push fields from BaseMemory down to the private classes. I don't think it should affect performance at all.

The idea is that a third-party could provide a Memory implementation based e. g. on distributed FS.

leerho commented 7 years ago

@leventov

BaseMemory is a "thought abstraction" at the moment and may migrate down the hierarchy anyway. But we can do that optimization later. Right now, I want to get the basic skeleton machinery of R/W Regions, ByteBuffers and MMF working and start some characterization tests to see where we stand on performance. If it is as simple as you say, then I'm not going to worry about it right now.

However, something that we should consider at this early stage is whether we will need thread concurrency. And instead of wrapping everything with Synchronized after we are all done, which is rather inefficient, we should think about using low-level read/write locks, or perhaps optimistic locking.

leventov commented 7 years ago

@leerho

However, something that we should consider at this early stage is whether we will need thread concurrency. And instead of wrapping everything with Synchronized after we are all done, which is rather inefficient, we should think about using low-level read/write locks, or perhaps optimistic locking.

Because of this and generally, I think it's time to try to apply this API, in a manner similar to #3915

leerho commented 7 years ago

@leventov

Hmm. The only way I see to do that is to make M and WM abstract and create the concrete impls with the Unsafe calls a layer below. The BM functionality must be at the same level (or higher) then where the Unsafe calls occur. Not sure how that impacts performance.

leerho commented 7 years ago

@leventov

Could use your thoughts with a conundrum using Cleaner. Cleaner is directly tied to an object while finalize() is indirect in that the JVM keeps track of those objects that have a finalize method.

Unsafe is stateful with native address (in various forms) and an optional object reference.

1) Create a parent class that refers to a native region in memory using Unsafe with these state values and register it with the JVM Cleaner.

2) Create a child "region" class that needs to refer to that same region of memory as a "view". It has to copy or refer to those same state variables (perhaps modified by an offset) in order to use Unsafe.

3) The parent's cleaner.clean() is called and the parent's references to native memory are all cleaned up and the memory is freed from the OS.

How do we clean the child? If the users still holds a reference to the child and tries to do any operation with Unsafe against the state variables it knows about, it will result in a seg-fault. Even if the nativeBaseAddress is set to zero!

Option A) If the parent has a Cleaner, then the children have to register their own cleaner as well. But it could create a race condition if the Parent cleaner is called preemptively by the user.

Option B) Similar to A, but we don't allow the user to ever call Cleaner preemptively. Once the GC starts all the user threads are disabled so we couldn't have the race condition.

If B is the way to go I have concern that we may have only a few parent objects, but have many thousands of child views of that parent. Will that overwhelm the Cleaner's linked list?

leerho commented 7 years ago

@leventov

Option C) I should've thought of this. The child has to hold a reference to the Parent, AND we can't allow preemptive cleans. Only when all the children and the parent are dereferenced will the GC clean it all up.

leventov commented 7 years ago

@leerho my suggestion is

Only one Cleaner per allocation. When views, subregions, positioned etc. are created, they have a reference to the "parent", but they shouldn't delegate their getXxx/putXxx methods to the parent, they copy address to their own fields, for faster access. They don't create their own Cleaners.
Manual clean via close() is allowed.
Cleaner and parent both reference AtomicBoolean "cleaned", which is set to true from inside Cleaner.
In their assertion-enabled methods, e. g. in checkIndex(), this flag is checked, and exception "use after free" is thrown, if already set to true.
If we encounter segfault in tests or production we turn on assertions for the memory package, and see where do we have use-after-free.

leerho commented 7 years ago

@leventov thanks, that is where I was headed. I wasn't so sure about the manual clean but if all of this is on a single thread it makes sense. We would have to think harder if this activity was across multiple threads.

niketh commented 7 years ago

@leerho @leventov IMO I think the cleaned flag isn't a great idea, its better not to have explicit close. Only NM registers cleaner, MR does nothing, NM clean will only get called when there is no one holding on the reference. Having a cleaned flag will mean that we have to keep doing assert checks in checkIndex() or equivalent check somewhere else

leventov commented 7 years ago

@niketh

Having a cleaned flag will mean that we have to keep doing assert checks in checkIndex() or equivalent check somewhere else

Yes. What's the problem with that? From performance perspective, it's free, JIT will remove that code entirely.

better not to have explicit close.

Nobody obligates you to use explicit close(), but sometimes it is handy and obviously safe:

try (Memory mem = Memory.map(file)) {
  read something from file...
}

Because ByteBuffer doesn't implement AutoCloseable, we had to create a wrapper: MappedByteBufferHandler. Let's not repeat this mistake with Memory.

leerho commented 7 years ago

@leventov

I have implemented a skeleton with two different schemas:

Notes:

The AllocateMap has not been incorporated yet,
Lots of methods missing
Both RO Regions and W Regions are done.
Very primitive tests
If you look at the code, it is currently configured as Version A, using WritableMemory2. With minor changes I can convert it to Version B using WritableMemory.

Version A:

public Memory (read only) is abstract, with no class vars
public WritableMemory is concrete, but NO duplication of code is required

Version A
  public abstract class Memory (no class vars)
    - public static RO methods: wrapping arrays, ByteBuffers, Maps
    - public abstract RO methods: get<primitive>, get<primitive>Array. region

      class MemoryImpl extends Memory
        - Concrete RO impls of Memory

      public class WritableMemory extends MemoryImpl 
        - public static W methods: wrapping arrays, ByteBuffer, Map, Direct alloc, autoByteArray
        - public W methods: put<primitive>, put<primitive>Array, getMemoryRequest(), freeMemory()

          class AllocateDirect extends WritableMemory implements AutoClosable
            - Allocates direct memory, uses Cleaner

          class AllocateMap extends WritableMemory implements AutoClosable
            - Allocates direct memory, uses Cleaner (for Map)

Version B:

public Memory is abstract with no class vars
public WritableMemory is abstract with no class vars, However, the WritableMemoryImpl concrete class requires duplication of all the MemoryImpl code

Version B
  public abstract class Memory (no class vars)
    - static RO methods: wrapping arrays, ByteBuffers, Maps
    - abstract RO methods: get<primitive>, get<primitive>Array. region

      class MemoryImpl extends Memory ()
        - Concrete RO impls of Memory

      public abstract class WritableMemory extends Memory (no class vars)
        - static W methods: wrapping arrays, ByteBuffers, Maps, Direct alloc, autoByteArrays
        - abstract W methods: put<primitive>, put<primitive>Array, getMemoryRequest(), freeMemory()

        class WritableMemoryImpl extends WritableMemory
          - Concrete RO impls of Memory (DUPLICATE CODE)
          - Concrete W impls of WritableMemory

          class AllocateDirect extends WritableMemoryImpl implements AutoClosable
            - Allocates direct memory, uses Cleaner

          class AllocateMap extends WritableMemoryImpl implements AutoClosable
            - Allocates direct memory, uses Cleaner (for Map)

I understand you wanted no state at the top of the hierarchy, but it is still not clear to me why. any instantiation will have state.

I prefer Version A, because no code duplication is required and fewer classes. If we allowed class vars at the root we could eliminate another class.

Comments?

leerho commented 7 years ago

@leventov

I have added the AtomicBoolean into the Dealocator for Cleaner and added the asserts for the current get/put methods for the Version A. Also added some tests where the Parent is freed, and the child region detects that the region is no longer valid via assert. Seems to work.

I think I will move to a Version C, similar to what you suggested earlier, with only one writable implementation, and use asserts to detect writable and JDK7 vs 8, etc. Maintaining this R/W separation is a PITA.

niketh commented 7 years ago

@leerho VersionA + Boolean flag to check if the parent has been freed looks good. What is approach C ?

leerho commented 7 years ago

It is in the parallel package memory3.

leerho commented 7 years ago

@leventov @niketh

See package-info

Version C is the furthest along and I have implemented the mechanisms for

AtomicBoolean check for valid allocation with asserts
asReadOnly() and violation detection with asserts
regions
JDK 7 & 8 detection & substitution.

Not yet Done:

Maps,
Many more methods to fill in from the current Memory
Need to transfer the JavaDocs,
Convert the UnitTests
Develop strategy for moving customers from current Memory to this one.

I have more than enough so that I can start some characterization tests and see how it performs against the current Memory.

Please review and comment, as the next stages will be a lot of work and I'd like to get feedback now.

leventov commented 7 years ago

@leerho sorry for late review. I like version B most. You don't have to duplicate code, because you don't need MemoryImpl, you can return WritableMemoryImpl from static factory methods which return Memory (read-only) as well. And read-only flag also won't be needed in this case, because Java compiler will ensure there are no illegal accesses, if we follow two simple rules: don't explicitly cast any Memory instance to WritableMemory, and don't use Memory as generic parameter.

I believe it's important to keep read-only and writable interfaces separate, I explained above why: https://github.com/druid-io/druid/issues/3892#issuecomment-279173943

leventov commented 7 years ago

@leerho Memory asked to be stateless and implementation-less to allow implementation based on completely different principles than Unsafe, e. g. something backed by network-attached storage: https://github.com/druid-io/druid/issues/3892#issuecomment-279304755

However it is not necessary now, because there is no reaction from @weijietong and probably he doesn't need #3716 anymore. Anyway, it could be fixed relatively easily later, if needed, because it doesn't incur API changes.

leerho commented 7 years ago

@leventov

Sorry, I had completely misinterpreted a previous statement of yours:

regarding x2 less classes, I meant that the separation between read-only and writable implementations is not needed. There are only writable implementations. The only thing that e. g. map() does is delegation to mapWritable(), and return the read-only interface. It is safe on the type-system level with very little precautions. Your approach with runtime separation provides extra safety, if you want to keep this safety - yes, you either still have x2 classes, or e. g. have a boolean flag "writable" and check it guarded by assertion, e. g. in checkOffset() method, along with the offset check.

Now that I read it again carefully and in the context of your latest comments, I understand what happened. I interpreted "implementations" as "classes that have been implemented", assuming the looser interpretation of "implementations". What you meant by "implementations" is "concrete classes" as opposed to an "abstract classes". And since in the same paragraph, you suggested having a boolean flag, the looser interpretation of "implementations" directly leads to Version C!

I will go back to Version B, but with ONE concrete WritableMemory class and no Memory concrete class :)

In your statement:

if we ... don't explicitly cast any Memory instance to WritableMemory...

Who are "we"? You and I certainly won't do that. But what is there to prevent a user from doing that? :)

And to make sure I understand what you mean, could you give me a code example of what you mean by:

and don't use Memory as generic parameter.

Thank you for your clarification on the need for the stateless root. I may as well keep the root stateless.

I have completed some initial characterization tests on the performance of this new hierarchy based Version C. (I don't expect the different versions to vary much since they should all collapse down to a single object after compilation and JIT). It is a little better than the current Memory implementation based on an interface, but still not as good as pure static calls, which will still be possible if a WritableMemory object is passed and the user goes to the trouble.

Current Memory Performance

rwmemoryheapdirect_21feb17

Version C Memory Performance

rwmemorycheapdirect_21feb17

Memory Performance using Static Methods Wrapping Unsafe

rwmemoryheapdirectviastaticunsafe_21feb17

leventov commented 7 years ago

Who are "we"? You and I certainly won't do that. But what is there to prevent a user from doing that? :)

DataSketches contributors and Druid contributors. It could be enforced even by adding a custom "Regexp" rule to checkstyle config, that looks for (WritableMemory) and Memory>

leerho commented 7 years ago

@leventov

I think you are referring to Java Generics instead of the linguistic definition of "generic" meaning "nonexclusive", "wide", or "universal" i.e., don't do:

class foo<Memory> ... { //do WritableMemory stuff }

This is essentially another form of bad casting.

leerho commented 7 years ago

@leventov

I'm concerned that the rule don't explicitly cast any Memory instance to WritableMemory will be very difficult to enforce. Not all users of DataSketches and/or Druid will necessarily be contributors, or participate in our forums, or use our checkstyle configurations.

So I'd like to keep the boolean readOnly as a backup check, in the hopes that at some point they at least run their code with assertions enabled. There is no cost at run-time.

leventov commented 7 years ago

@leerho

So I'd like to keep the boolean readOnly as a backup check, in the hopes that at some point they at least run their code with assertions enabled. There is no cost at run-time.

Ok.

One thing that wasn't discussed and designed yet, is how do we support big-endian writes and reads, to support legacy formats.

Options:

Explicit putXxxBigEndian(offset, v)/getXxxBigEndian(offset) or putXxx(offset, v, ByteOrder order) methods in the main interface. (This approach is taken in Argona's buffers)
A separate implementation, with own factory methods like mapFile(f, ByteOrder order), or a conversion method Memory.asBigEndian()
No support, need to convert and use ByteBuffer to be able to read and write legacy formats.

leerho commented 7 years ago

@leventov

Supporting BE/LE is messy. We need to consider dual-mode cases: e.g., reading TCP/IP packets where the protocol fields are BE, and the payload could be LE. Not that I want to write packet handlers, but I could believe that careless use of ByteBuffer (without always specifying endianness) could lead to horrible mixed endianness data!

I also don't want to encourage the propagation of BE data when it is clearly unnecessary and a significant performance hit.

Explicit: putXBigEndian(offset, v) ... would double the already large number of methods, and becomes a slippery slope WRT how we implement the other variants (positional). I'm not really keen on this one.
Per call: putX(offset, v, ByteOrder order) incurs a cost with every put/get whether BE or LE. So I'm definitely opposed to this unless it is a totally separate implementation.
Totally separate implementation: All put/get methods would be of the form putX(offset, v, ByteOrder order). This has the advantage of covering the horrible mixed endianness cases. The specific factory mapFile(f, offset, capacity) would not have to specify endianness as this is an option on every put/get that the mapping class extends. However, wrapping ByteBuffer, with its internal endianness state, properly (and maintain sanity) may take a little more thought. E.g., do we keep the endianness state of BB in our state and make it visible? How easy would it be for a developer to accidentally end up with double byte-swaps?

A separate implementations is a lot of work, and, IMHO, provides little incentive for developers to fix the fundamental underlying problem. Keeping these separate implementations in sync with changes to the basic implementation could be a maintenance headache.

No support: I kinda like this one as it would really encourage developers to go back and regenerate their data using LE (if at all possible) or at least somehow mark the data as "deprecated" to be eventually replaced with the LE form. One option would require developers to explicitly call byte swappers for every put/get call. We could provide efficient byte swapper methods (only 3), but that would be the extent of it.
Separate Library Another possibility is a totally separate package, possibly in the Druid domain, that implements these variants (which, at this point, could be based on true interfaces) and underneath calls or extends the Memory package, which will always be Native-Endianness and non-positional.

Whatever we recommend here needs to be in context with however we decide to support other variants (e.g., positional put/gets, thread-safe?, ...). Is the positional approach also a totally separate implementation? Or should we ask developers to write their own positional logic that wraps the calls they need? (We could provide example code on how to do that.)

With all the potential separate implementation variants (basic, endianness, positional, thread-safe, etc), we are talking about a huge amount of code, and way beyond what I originally envisioned as a small, fast library that simply extends Unsafe. And, BTW, NONE of these other implementations will have the speed of the basic implementation (and where I start losing interest).

leerho commented 7 years ago

@leventov

BTW, so far, these other variants will not be able to leverage the stateless base classes Memory and WritableMemory that we have in the current design. This calls into question the value of these stateless classes and makes them more like bloat. I am seriously considering removing one or both of them.

leventov commented 7 years ago

@leerho

`putXBigEndian(offset, v)

Agree, it's bad. Also it looks more noisy and makes easy to make a mistake, forgetting "BigEndian" suffix somewhere.

putX(offset, v, ByteOrder order)

The same, and also noisy and even simpler to make (and harder to spot) mistakes, forgetting the third argument.

Totally separate implementation: All put/get methods would be of the form putX(offset, v, ByteOrder order).

Didn't understand how it's different from the previous.

This has the advantage of covering the horrible mixed endianness cases.

There shouldn't be such cases, at least I can speak for Druid. Either we need to support legacy file/network/cache-key formats which are big-endian only, or use native endian for new formats, and for non-persistent computational buffers, which live only within the JVM.

A separate implementations is a lot of work, and, IMHO, provides little incentive for developers to fix the fundamental underlying problem. Keeping these separate implementations in sync with changes to the basic implementation could be a maintenance headache.

Let's keep in mind very clearly, that we develop a library not for abstract "developers", we develop it for ourselves. Big-endian poisoned the Druid codebase for two reasons:

All ByteBuffer factory methods return big-endian ByteBuffer, and changing it to native order: order(ByteOrder.nativeOrder()) is very verbose.
asReadOnlyBuffer(), slice(), duplicate() return big-endian buffer regardless of the order of the parent, shame.

By merely making Memory's static factory methods to return native-ordered Memory by default, and ensure order inheritance on transformations, we fix the problem for all new code/new formats. On the other hand, Druid has to support old big-endian formats forever, so it's pointless to make it "painful" to use big-endian in order to "provide incentive" for ourselves to "move" to native order. It's just making our own live harder for nothing.

My suggestion We can make a "NonNativeWritableMemoryImpl" class extending WritableMemoryImpl, and overriding 12 methods (6 getXxx and 6 putXxx) with delegation to super and swapping bytes. It's a burden, but IMO it's not that bad to maintain those 12 methods + 12 similar methods for Positioned. Also keep in mind that Memory is supposed to be changed very rarely, after initially developed, unlike Druid/Datasketches code.

The advantage or this approach is that the implementation code could be fully shared for legacy formats (big-endian) and new formats (native-ordered), if nothing is changed except the byte order. If the matter of providing Memory with different byte order as the input for format reading/ output for format writing.

Factories: Memory.map(file) has counterpart, which accepts ByteOrder as an additional parameter: Memory.map(file, ByteOrder order). Factory code internally decides what to return: WritableMemoryImpl or "NonNativeWritableMemoryImpl", depending on the native platform byte order and the given byte order.

leventov commented 7 years ago

@leerho

Thread-safe Memory is not needed.

PositionedMemory could be made pretty fast IMO, if instead of wrapping Memory and keeping a separate "position" field, it's made very similar to WritableMemoryImpl, with the difference that cumulativeOffset is not a final field, and updated on each operation.

BTW, so far, these other variants will not be able to leverage the stateless base classes Memory and WritableMemory that we have in the current design. This calls into question the value of these stateless classes and makes them more like bloat. I am seriously considering removing one or both of them.

I believe in stateless and separation of stateless hierarchy (Memory) and Positioned. Let's made initial API with them separate, and if in the process/after the migration of Druid and Datasketches code it is be proven that separation is not needed, they could be merged. This extra refactoring pass, based on the actual migration experience will be needed anyway.

leerho commented 7 years ago

@leventov

12 methods (6 getXxx and 6 putXxx)

Lets count just the ones impacted by byte-order: 24 primitive puts/gets + 6 atomic = 30:

getChar, getCharArray, getShort, getShortArray, getInt, getIntArray, getLong, getLongArray,
getFloat, getFloatArray, getDouble, getDoubleArray, putChar, putCharArray, putShort,
putShortArray, putInt, putIntArray, putLong, putLongArray, putFloat, putFloatArray, putDouble,
putDoubleArray, getAndAddInt, getAndAddLong, compareAndSwapInt, compareAndSwapLong,
getAndSetInt, getAndSetLong

Thread-safe Memory is not needed.

I'm not convinced. Let's look at Druid's SynchronizedUnion:

This came about in a real-time query application where data was continuously being ingested into sketches in Druid and the same sketches were being accessed by multiple query threads. In order to solve the concurrency problem quickly, the entire Union object was wrapped with a brute-force synchronized approach on all the methods, which is unnecessarily slow. What we actually could use in the Druid case is a single-writer, multiple reader read/write locks, or even better, optimistic locking. I have some folks investigating this now. In order to do this with off-heap data structures the Java locking mechanisms will not work. We will at least require the use of the unsafe atomic methods mentioned above and likely modification of the put/get calls as well. I don't have the full answer on this yet.

leventov commented 7 years ago

@leerho ok, I was wrong. 30 + 30 = 60 similar methods is a big burden, but I still think it's feasible to write them once, with almost no attention required later. Also many methods may delegate to each other on WritableMemoryImpl level: float methods delegate to int methods, double to long, char to short or vice-versa. Such methods don't need to overridden in "NonNativeWritableMemoryImpl".

This came about in a real-time query application where data was continuously being ingested into sketches in Druid and the same sketches were being accessed by multiple query threads. In order to solve the concurrency problem quickly, the entire Union object was wrapped with a brute-force synchronized approach on all the methods, which is unnecessarily slow. What we actually could use in the Druid case is a single-writer, multiple reader read/write locks, or even better, optimistic locking. I have some folks investigating this now. In order to do this with off-heap data structures the Java locking mechanisms will not work. We will at least require the use of the unsafe atomic methods mentioned above and likely modification of the put/get calls as well. I don't have the full answer on this yet.

Either int or long memory "fields" + compareAndSwap operations on them + may need to add getIntVolatile, putIntVolatile, putIntOrdered, getLongVolatile, putLongVolatile, putLongOrdered could be used for implementing lightweight spin locks. Or some synchronization mechanisms used, external to Memory objects. "ThreadSafeMemory" is not a solution.

leerho commented 7 years ago

@leventov

I think we are close to agreement. I haven't added the volatile and ordered methods yet until I see what our whole approach for the concurrent sketches will actually require.

leerho commented 7 years ago

@leventov

I have a a design issue perhaps you could comment on:

A ByteBuffer has an internal ReadOnly / Writable state (as does a file). Because we now have only one implementation, WritableMemoryImpl, the internal helper class AccessByteBuffer must return a WritableMemoryImpl. Similarly, the AccessDirectMap also returns a WritableMemoryImpl.

Close():

I do not provide close() in Memory, it is only available in WritableMemory. I don't want downstream read-only users to inadvertently close the original BB or Map object.

This means that I can't provide in Memory a public static Memory wrap(ByteBuffer) or a public static Memory map(file ...) because ultimately these objects need to be closed so that the atomic valid flag is set to false and propagates to any child regions.

So the only way to create a RO Memory instance of a BB (or map) is via asReadOnly() from a WritableMemory instance. This also means that I cannot throw an error during construction if the user attempts to create a WritableMemory of a RO BB or Map.

And, the only way during runtime to detect a write against a WritableMemory instance of a RO BB is when asserts are enabled. Can we live with this?

Maps are safer in that if the file is RO, the OS will throw an error on an attempted write.

Any better ideas?

leventov commented 7 years ago

@leerho

Memory should also be closeable, because it's needed for a common pattern:

try (Memory mem = Memory.map(file)) {
  .. read something from file
}

I didn't understand it clearly but probably the rest of the problem will go away also.

Downstream read-only users can inadvertently close Memory: I agree it's not great, but currently I think it's a required trade-off (maybe my mind will change when we do actual Druid code migration and see how it works). As I noted here: https://github.com/druid-io/druid/issues/3892#issuecomment-280742485, Memory.close() should be rarely used. Maybe we can apply policy that any Memory object is closed only in the case class as where it's created, if a class accepts Memory as a parameter (downstream user), it must not close it.

leerho commented 7 years ago

@leventov

The pattern I am worried about is:

WritableMemory wmem = WritableMemory.writableWrap(ByteBuffer writableBB);
Memory mem = wmem.asReadOnly();

// pass mem as an argument to 1000 different downstream clients that need read access ...
// ...if any one of them accidentally calls close() you are hosed! 

wmem.close(); //The owner of the original object should be the only one closing it.

I don't understand what your "tradeoff" concern is. Since we only have one way to instantiate a BB we could make it less verbose: WritableMemory wmem = WritableMemory.wrap(BB). Or perhaps we could make the name more concise: WriteMemory or Wmemory. So we would have:

WMemory wmem = WMemory.wrap(ByteBuffer writableBB);
Memory mem = wmem.asReadOnly();

I would recommend that we start with not allowing Memory.close(). Memory.close may be rare in Druid, but I am worried about other environments including Yahoo!

leventov commented 7 years ago

Ok, let's try something like MemoryHandle extends AutoCloseable, with Memory get() method. Memory.map() and allocateDirect return MemoryHandle, other static factory methods return just Memory. And Memory with WritableMemory are not AutoCloseable.

Then the pattern above is

try (MemoryHandle mh = Memory.map(file)) {
  Memory mem = mh.get();
  .. read from file
}

Memory mem = wmem.asReadOnly();

Shouldn't need asReadOnly(), just upcast: Memory mem = wmem;. And actually this line not needed, just pass wmem to methods that accept Memory.

WriteMemory, Wmemory

I like natural English WritableMemory most

leerho commented 7 years ago

@leventov

Does your MemoryHandle apply to BB? Could you give me a skeleton of what MH extends and includes?

I need the asReadOnly to set the readOnly flag, which an upcast will not do. The RO flag is the only backup to detect a improper cast of (WritableMemory) memory.

leventov commented 7 years ago

Does your MemoryHandle apply to BB?

If you mean "should Memory.wrap(BB) return MemoryHandle" I think no, because somebody who allocated BB should care about closing it.

MemoryHandle should be similar to MappedByteBufferHandler. Since Memory still want to be able to catch use-after-free, Cleaner should still be private to Memory (WritableMemoryImpl). The only difference is that Memory.close() made package-private, and only MemoryHandleImpl can access it. (MemoryHandle itself is an interface)

I need the asReadOnly to set the readOnly flag, which an upcast will not do. The RO flag is the only backup to detect a improper cast of (WritableMemory) memory.

You shouldn't be able to obtain WritableMemory which is not actually writable, in the first place :) If asReadOnly() just makes a check and returns this, it's ok

leerho commented 7 years ago

@leventov

Something like:

public final class MemoryHandler implements AutoCloseable {
  private final WritableMemory wmem;

  MemoryHandler( WritableMemoryImpl wmem) {
    this.wmem = wmem;
  }

  public Memory get() { return wmem.asReadOnly(); } //sets the RO flag
  public WritableMemory getWritable() { return wmem; }

  //calls the package private close() method of wmem
  public void close() { wmem.close(); }
}

AccessByteBuffer, AllocateDirectMap and AllocateDirect all extend WritableMemoryImpl and return MH. So do all the static methods that create heap array backed Memory or WritableMemory.

Now the only way to close the retained WritableMemoryImpl is through MH and there can be only one owner of an MH. This also means MH is the only class that the compiler detects that requires a close() or should be implemented with try-with-resource.

The rest of the hierarchy is the same.

leventov commented 7 years ago

@leerho

I forgot that we have Memory and WritableMemory, in this case I would prefer to have two separate interfaces MemoryHandle and WritableMemoryHandle extends MemoryHandle, both with a single get() method, because getWritable() allows Memory.map(file).getWritable(), that shouldn't be allowed.

AccessByteBuffer, AllocateDirectMap and AllocateDirect all extend WritableMemoryImpl

I don't understand why it is needed to have subclasses with only specialized constructors. It will make difficult to implement NonNativeWritableMemoryImpl. I suggest to implement AccessByteBuffer's, AllocateDirectMap's and AllocateDirect's logic in static factory methods, then passing all prepared arguments to WritableMemoryImpl or NonNativeWritableMemoryImpl constructors.

leventov commented 7 years ago

Also I want to return to asReadOnly(), I don't actually understand what exactly it will check? It should be possible to upcast WritableMemory to Memory, because during query processing some buffers first act as writable, and then as read-only.

leerho commented 7 years ago

@leventov

What is making things tricky is that we only have one implementation, WritableMemoryImpl, which allows problematic casts. I am beginning to feel that splitting into two impls would be a lot cleaner, safer, and just not worry about the 30+ duplicated methods. I have already organized the code in the WritableMemoryImpl class so that all the RO methods are together, so a simple copy & paste should be straightforward.

With two impls, asReadOnly creates a new class, and going from a Memory to WritableMemory would never be possible.

One impl: The readOnly flag is only checked on the write methods with an assert. So if you cast a Memory object to a WritableMemory the readOnly flag will be true and not change and will throw an assertion error (with -ea).

As I explained before, ByteBuffers and Maps have their own internal RO/W state. And because we have only one writable impl, we can end up with an RO map/BB wrapped by the writable impl. This creates the need for the readOnly flag and the assert check of W methods.

If you do Memory mem = (WritableMemory)writableMem, the readOnly flag remains false (writable). the Memory API prevents writes, but then this also allows a downstream client to do the reverse cast, since the underlying impl is the same. I don't see any easy solution to this other than having two impls.

I do like the MemoryHandler idea as it isolates only those objects that actually require AutoCloseable from those that do not.

leerho commented 7 years ago

@leventov

I don't understand why it is needed to have subclasses with only specialized constructors. It will make difficult to implement NonNativeWritableMemoryImpl.

AllocateDirect and AllocateDirectMap both require Cleaner as a field. The Cleaners have different implementations. These two need to be a subclass so that a user call of close() at the root of the hierarchy does the work specified in the appropriate subclass.
Also, the WritableMemoryImpl class is already huge, so I was looking for opportunities to split out well defined subsets of code.
AccessByteBuffer has a lot of specialized logic so it made sense to make it a separate class.

Honestly, I haven't thought much about how to extend all this for BigEndian, or Positional either. I am just trying to get the basic functioning done with a workable public API.

leerho commented 7 years ago

@leventov

Using an intermediate interface we can leverage AllocateDirect, AllocateDirectMap and AccessByteBuffer across multiple implementations.

Since positional will need to leverage both NE and BE, it should be a true interface.

leventov commented 7 years ago

What is making things tricky is that we only have one implementation, WritableMemoryImpl, which allows problematic casts. I am beginning to feel that splitting into two impls would be a lot cleaner, safer, and just not worry about the 30+ duplicated methods. I have already organized the code in the WritableMemoryImpl class so that all the RO methods are together, so a simple copy & paste should be straightforward.

Personally I don't care that much about code duplication, but keep in mind "NonNative" subclasses, they should also be duplicated then.

With two impls, asReadOnly creates a new class, and going from a Memory to WritableMemory would never be possible.

This is actually what I want to avoid, see the very first message in this thread, "Goals" section, "Don't make a lot of Object copies when a Memory object has to be "sliced", "converted to read-only", etc.". See https://github.com/druid-io/druid/pull/3716#issuecomment-274658045. Currently Druid creates millions of unnecessary read-only ByteBuffers, "for safety". But I think mere separation of interfaces, Memory and WritableMemory, provides enough safety.

the Memory API prevents writes, but then this also allows a downstream client to do the reverse cast, since the underlying impl is the same. I don't see any easy solution to this other than having two impls.

We are creating library for ourselves, not for "dumb users who will do the wrong thing if they could". There is a solution: don't do reverse casts! I'm pretty sure nobody will ever try to do reverse casts, but then it is catched during code review, and even could be enforced using checkstyle rules, as suggested above: https://github.com/druid-io/druid/issues/3892#issuecomment-279173943

AllocateDirect and AllocateDirectMap both require Cleaner as a field. The Cleaners have different implementations. These two need to be a subclass so that a user call of close() at the root of the hierarchy does the work specified in the appropriate subclass.

My intention was that there is a Cleaner field in WritableMemoryImpl, and it's null when cleaning is not needed (e. g. heap allocation). And close() method contains something like

if (cleaner != null) {
  cleaner.clean();
}

Also, the WritableMemoryImpl class is already huge, so I was looking for opportunities to split out well defined subsets of code.

The problem is requirement for "NonNative" subclass, that will multiply the amount of code, if there is more than one "WritableMemoryImpl"

Using an intermediate interface we can leverage AllocateDirect, AllocateDirectMap and AccessByteBuffer across multiple implementations.

Don't understand this

Since positional will need to leverage both NE and BE, it should be a true interface.

The only reason why Memory and WritableMemory are not true interfaces, is ability to have static factory methods. My idea was to create Positional from existing Memory or WritableMemory objects, using region(start, limit) method. So Positional itself doesn't have static factory methods and could be a true interface.

leerho commented 7 years ago

@leventov

With two impls, asReadOnly creates a new class, and going from a Memory to WritableMemory would never be possible.
This is actually what I want to avoid, see the very first message in this thread, "Goals" section, "Don't make a lot of Object copies when a Memory object has to be "sliced", "converted to read-only", etc.".

I'm trying to understand exactly what you are trying to avoid. When I say "creates a new class", what I mean is a new wrapper pointing to the same resource. Only a few pointer registers get created. With a single impl asReadOnly can be done with an up-cast. But slices will always require a new wrapper object. Also viewing a resource as positional will require creating a different wrapper object pointing to the same resource. (It is also possible to create two views of the same resource using different endianness just using different wrappers).

What is the use pattern that you think is creating all these "unnecessary objects"?

Are the vast majority of ByteBuffer objects created to protect the parent's position and limit state from being changed by the child and to protect the children from affecting each other? If this is the case then:

WritableMemory wMem = //original resource
Memory mem = wMem.asReadOnly() //created only once

//pass mem to many children
ChildProcess(mem, offset, length) {
    //Each child can either read directly using offsets, or
    Buffer buf = mem.asBuffer(). //I am suggesting that "Buffer" = "PositionalMemory" 
    buf.setPosition(offset);
    buf.setLimit(offset + length);
    //yes, this creates another wrapper, but may be necessary depending on what the child is doing
    ....
}

Even if we have both R and W impls, the asReadOnly wrapper is created only once. This should dramatically reduce the number of distinct wrapper objects. With ByteBuffer, the children have to be protected from each other so a new wrapper is required for each child.

leventov commented 7 years ago

@leerho

I'm trying to understand exactly what you are trying to avoid. When I say "creates a new class", what I mean is a new wrapper pointing to the same resource. Only a few pointer registers get created. With a single impl asReadOnly can be done with an up-cast. But slices will always require a new wrapper object. Also viewing a resource as positional will require creating a different wrapper object pointing to the same resource. (It is also possible to create two views of the same resource using different endianness just using different wrappers).

Currently in Druid, a lot of asReadOnlyBuffer() are called on already read-only buffers, because there is no way to distinguish between read-only and writable on type level, and "asReadOnlyBuffer()" is called "for safety". Slicing should be partially replaced with passing primitive "position" (and, optionally, "size"/"limit") arguments along with Memory objects. Currently slicing is also partially fostered by the fact that "positional" and "absolute" interfaces are mixed in ByteBuffer.

I'm afraid we are entering design paralysis phase, let's iterate specifically on points where we disagree. API first. How I currently see it:

abstract class Memory {
  static WritableMemory allocate(long size);
  static WritableMemoryHandler allocateDirect(long size);
  static Memory wrap(ByteBuffer);
  static WritableMemory wrapForWrite(ByteBuffer);
  static MemoryHandler map(File);
  static MemoryHandler map(File, long pos, long len);
  static WritableMemoryHandler mapForWrite(File);
  static WritableMemoryHandler mapForWrite(File, long pos, long len);

  // no fields

  long size();

  ByteOrder order();
  Memory withOrder(ByteOrder order);

  Buffer buffer(long position, long limit);

  .. getXxx(long pos) methods
}

abstract class WritableMemory extends Memory {
  // no fields
  WritableMemory withOrder(ByteOrder order);
  WritableBuffer buffer(long position, long limit);
  Memory asReadOnly();

  .. putXxx(position) methods
}

interface Buffer {

  .. getXxx() methods
}

interface WritableBuffer {
  .. putXxx() methods
}

interface MemoryHandler extends AutoCloseable {
  Memory get();

  void close();
}

interface WritableMemoryHandler extends MemoryHandler {
  WritableMemory get();
}

leerho commented 7 years ago

@leventov

Here is what is implemented so far and compiles (see memory4): Completing Buffer, WritableBuffer, asNonNativeEndian(), asNativeEndian() is pretty mechanical from here.

public abstract class Memory {
  public static MemoryHandler wrap(final ByteBuffer byteBuf)
  public static MemoryHandler map(final File file)
  public static MemoryHandler map(final File file, final long fileOffset, final long capacity)
  public abstract Memory region(long offsetBytes, long capacityBytes)
  public static Memory wrap(final prim-type[] arr)
  public abstract void copy(long srcOffsetBytes, WritableMemory destination, long dstOffsetBytes,
      long lengthBytes)
  public abstract getXXX(offset) methods
  ... plus other read misc
  public abstract asNonNativeEndian() //not implemented
  public abstract asNativeEndian() //not implemented
}

public abstract class WritableMemory {
  public static WritableMemoryHandler wrap(final ByteBuffer byteBuf)
  public static WritableMemoryHandler map(final File file)
  public static WritableMemoryHandler map(final File file, final long fileOffset, final long capacity)
  public static WritableMemoryHandler allocateDirect(final long capacityBytes)
  public static WritableMemoryHandler allocateDirect(final long capacityBytes, final MemoryRequest memReq)
  public abstract WritableMemory region(long offsetBytes, long capacityBytes)
  public abstract Memory asReadOnly();
  public static WritableMemory allocate(final int capacityBytes)
  public static WritableMemory wrap(final prim-type[] arr)
  public abstract void copy(long srcOffsetBytes, WritableMemory destination, long dstOffsetBytes,
      long lengthBytes);
  public abstract getXXX(offset) methods
  ... plus other read misc
  public abstract void putXXX(long offsetBytes, prim-type value)
  ... plus other write misc
  public abstract MemoryRequest getMemoryRequest()
  public abstract asNonNativeEndian() //not implemented
  public abstract asNativeEndian() //not implemented
}

public interface MemoryHandler extends AutoCloseable {
  Memory get()
  void close()
  void load()        //only for memory-mapped-files
  boolean isLoaded() //only for memory-mapped-files
}

public interface WritableMemoryHandler extends AutoCloseable {
  WritableMemory get()
  void close()
  void load()        //only for memory-mapped-files
  boolean isLoaded() //only for memory-mapped-files
  void force()       //only for memory-mapped-files
}

public abstract Buffer { //Not implemented
}

public abstract WritableBuffer { //Not implemented
}

leventov commented 7 years ago

@leerho

wrap(ByteBuffer) returns Handler instead of just Memory - might complicate implementation, while not actually needed.
wrap(ByteBuffer) should inherit endianness from the buffer.
asNonNativeEndian() and asNativeEndian() won't be convenient, in Druid we need to convert endianness when we know the specific endianness that we need, usually BIG_ENDIAN. So using this API, it would be something like ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN ? mem.asNativeEndian() : mem.asNonNativeEndian(), instead of just mem.withOrder(ByteOrder.BIG_ENDIAN)
What for MemoryRequest?
load(), isLoaded() and force() implementation requires JNI. And those methods are unused in Druid. I think they could be dropped.

Everything else looks good to me.

leerho commented 7 years ago

@leventov

wrap(ByteBuffer) returns Handler instead of just Memory - might complicate implementation, while not actually needed.

I'm was being conservative here, this is easy to remove after we have discussed it. I was trying to capture the scenario where the owner of a BB passes a writable region to a client and then lets his own BB go out of scope thinking that he is done with it and no changes can be made. If the owner uses try-with-resources or close(), then the client's copy is immediately marked invalid even though the JVM won't collect it until the client's handle is released. This is a variant on the use-after-close scenario.

wrap(ByteBuffer) should inherit endianness from the buffer.

Good catch. Thank you.

asNonNativeEndian() and asNativeEndian()

Good point, I can do the detection of whether the chosen endianness is different from the native internally.

What for MemoryRequest?

This is for off-heap dynamic data structures that can possibly grow in size and where the process that is managing this data structure does not "own" the allocation of memory. This is a simple callback mechanism to allow the algorithm that is managing the data structure to request more memory if needed.

Sketches is an excellent example of this. Some sketches, like Theta Sketches, have an upper bound in size. Others, like the quantiles sketches don't have an upper bound, but they grow very very slowly. They all start very small in size, e.g., a Theta Update Sketchwith a few entries can be a few hundred bytes, but can grow to a megabyte or so depending on how it is configured.
When you have millions of these sketches, the overall size adds up. Fortunately, our typical data is highly fragmented into many different dimensions and the resulting stream lengths are highly skewed with power-law distributions. This means that the vast majority of sketches only ever get a few entries, and the number of sketches that actually grow to full size is a tiny fraction. But which ones? If you allocate full space for all the sketches you would be wasting terabytes of RAM. The MemoryRequest mechanism allows the systems engineer to allocate orders-of-magnitude less space for all the required sketches and then allocate more space only for those few sketches that do require it as needed.

So far, Druid has been managing its off-heap space using direct ByteBuffers. But once you start moving away from ByteBuffers and start allocating off-heap memory directly, you are essentially managing your own heap. I really don't want to go down the slippery-slope of creating our own malloc, as that would be a significant undertaking. The MemoryRequest is a simplistic mechanism that hopefully, will delay our having to design a actual malloc/free system.

load(), isLoaded() and force() implementation requires JNI.

We are accessing these methods and their internal implementation using reflection calls to the underlying methods, which seem to be working. Using these calls can speed up access to mapped files considerably.

leventov commented 7 years ago

@leerho ok. Could you complete memory4 (buffers, endianness, a bunch of get/put methods), so we can start actual refactoring of Druid?

leventov commented 7 years ago

Also I think copy() should be called get(), similar to ByteBuffer and to resolve ambiguity: copy to or copy from?

apache / druid

Move from ByteBuffer to Memory #3892