Closed leventov closed 1 year ago
How about copyTo. In BB get() is used for primitives and getXXXArray() for getting into your local context. Neither get() or put() make sense here as you could be copying bytes out of the JVM entirely.
Buffers and endianness will be quite a bit of work. Ultimately, I'd like to create a set of pragmas and a generator so that they can all be created automatically. If I only start with a few methods, could you give me a list of which ones to start with?
copyTo()
is good.
Actually for starting refactoring implementation is not needed, so endianness could be delayed, just get/put methods should be present (for Memory and Buffer), maybe with UnsupportedOperationException impl.
@leventov
wrap(ByteBuffer) should inherit endianness from the buffer.
Until I figure out how exactly I want to do endianness, the current Memory is only NE. So, for now, attempting to wrap a BE BB is an error, which it now checks.
@leventov @niketh
Consolidating Memory / MemoryImpl, WritableMemory / WritableMemoryImpl.
I haven't had any real reason to do this until now. However, after discussing with @niketh some of the issues he had to address when trying to implement the current DataSketches Memory into Druid I learned about some additional capabilities that would have been very helpful. One very important capability was:
This enables a byte ordering on objects independent of datatype, which Druid uses a lot. Because of our multiple Impls, I don't want to have to generate all the combinations of compare(Memory, Memory), compare(Memory, WritableMemory)
, etc.
So here returns a root class that only knows about bytes
, call it BaseBytes
. It would have one field, MemoryState
(which we could rename as BaseState
). Both Memory, WritableMemory, Buffer, WritableBuffer
(which now may as well be impls) would extend BaseBytes.
BaseBytes would have static methods that would just do byte operations, such as compare(BaseBytes a, BaseBytes b)
, copy( a, b)
, or even the possibility of Transform(a, b)
... as long as the operation doesn't need any information about the structure or type of data. Because BaseState
would also be at that level, it can check for RO state and would know the base offsets, etc. All read-only, strictly byte oriented methods could also be moved to BaseBytes
.
Even though BaseBytes
is a common root class, it is not possible to cast from Memory
to WritableMemory
via BaseBytes
. This is not caught at compile time, but it is caught at runtime.
Thoughts?
@leerho
What about Memory.compareTo(offset, len, Memory other, otherOffset, otherLen)
? You don't need any special API method or implementation compare(Memory, WritableMemory)
, because WritableMemory
extends Memory
, so you can always use the same method.
@leventov Look again. WritableMemory does not extend Memory. This is the two impl model. Also, in your snippet you only need one len.
@leerho it means that new objects are required to be created where WritableMemory is passed to a method, accepting read-only Memory, that makes the situation with the third goal in the very first message in this thread even worse that it used to be with ByteBuffers. With ByteBuffers, API encourages to create asReadOnly() copies "out of fear", but it was not required. With what you propose, it is simply required. I disagree with this.
Actually I didn't notice in your proposition in this message: https://github.com/druid-io/druid/issues/3892#issuecomment-284964809 that WritableMemory doesn't extend Memory. I disagree with this.
When WritableMemory
extends Memory
and all methods, that are not supposed to write, accept Memory
, it's impossible to accidentally violate read/write satefy, you should intentionally cast Memory to WritableMemory (and even this could be hardly prohibited with a simple Checkstyle rule). On the contrary, it's super-easy to violate bounds safety (off-by-ones, wrong primitive argument, etc.) And yet we agree to not make bound checks by default (only with assertions enabled).
Read/write safety IMO is not a problem at all, as soon as there is a read-only superclass Memory
, that ByteBuffer API lacks. Making the system even "more read/write safe" doesn't deserve even little sacrifices.
Not to mention that "WritableMemory
not extending Memory
" creates a lot of problems with code sharing, starting from the method that we are discussing, compareTo()
. And a lot more methods: copyTo()
, hash code computation, compression, object deserialization, etc.
Also, in your snippet you only need one len.
Sometimes you want to compare byte sequences of different lengths, as well as it's not prohibited to compare Strings of different lengths.
@leventov @niketh
All fixed. One impl. Cast to Memory
from WritableMemory
works. Both compareTo
and copyTo
have been implemented. @niketh is working on the Buffer
/ WritableBuffer
impls.
@leerho thanks!
I see you decided to name WritableMemory's static factory methods with "writable" prefix. This is because you are concerned about overloading of Memory's methods? In this case I suggest to move them to Memory, because WritableMemory.writableMap()
is a needless repetition. It could be Memory.writableMap()
.
@leventov
Yes, I was getting overloading errors, but the reason was because I still had WritableResouceHandler
and ResourceHandler
as separate classes from the previous scheme. By making WritableResourceHandler
extend ResourceHandler
(parallel to WritableMemory
extend Memory
) it fixed the overloading problem.
I have removed all the prefixes of writable except for one: writableRegion(offset, capacity). This method works off of an instance instead of the class. The call is myMem.writableRegion(...)
, so there is no repetition.
We could also make this a static method and then the calls would be WritableMemory.region(WritableMemory mem, long offset, long capacity)
and Memory.region(Memory mem, long offset, long capacity)
. Then there would be no "writable" prefixes on method names.
This way of creating a region would then be virtually the same as if you just passed (Memory, offset, capacity) to a client, and let them do their own positioning. The latter does not create a new object but the client has a view of the total parent capacity. The former creates a new object wrapper, but limits the client to what they can see. There are use cases for both.
I do prefer accessing the Writable "constructor-type" methods from the WritableMemory class.
@leerho Thanks.
The current form: mem.writableRegion()
is OK to me.
@leventov @niketh @cheddar @gianm @weijietong @AshwinJay
Development of the new memory architecture has been migrated from experimental/memory4 to its own, more visible repository memory
in DataSketches
.
I have completed the central Memory and WritableMemory implementation and first wave of unit tests, with coverage at about 95%. I think (hope) the API is fairly stable. I will try to put together a set of bullet points summarizing the features of the API and why some of the choices were made. Meanwhile, I look forward to any comments or suggestions you have.
@niketh and I will soon be focusing on a positional extension to this work.
I want to thank @leventov for his thoughtful contributions for much of this design.
@leventov @niketh @cheddar @gianm @weijietong @AshwinJay
The Memory and Buffer hierarchies are checked in to master. @niketh is working on more unit tests, especially for the Buffer hierarchy. Hopefully we can have a release to Maven Central this week. Please look it over.
@leerho you mean here: https://github.com/DataSketches/memory?
I think we completely agree on API, except that it misses (?) byteOrdering functionality, that is needed for making refactoring of Druid. Because we need to support many old formats which are big-endian.
I didn't review internal implementation details because actual refactoring of Druid and/or DataSketches with the new API may demonstrate that the new API is problematic in some ways and needs to be reworked. So I'm going to review implementation details of Memory after Druid refactoring PR.
you mean here: https://github.com/DataSketches/memory?
Yes.
It was recommended by both @cheddar and @niketh that the byte-ordering functionality is not essential, and that it was more important to get this package out, so that folks can start working with it. I have no plans to implement byte-ordering.
@niketh already has submitted a PR based on the original memory API and has a real good understanding of the implementation issues, and will be the one using this new API to resubmit a new PR based on it. Certainly if he runs into issues with the API we will make adjustments.
It was recommended by both @cheddar and @niketh that the byte-ordering functionality is not essential, and that it was more important to get this package out, so that folks can start working with it. I have no plans to implement byte-ordering.
This is one of the things that actual attempt of refactoring of Druid should verify. So yes, we can try to start refactoring without byteOrdering functionality and see if it works well.
@leerho BTW since Druid has now officially moved to Java 8, should Memory still support Java 7? I see https://github.com/DataSketches/sketches-core/ is also Java 8.
If it shouldn't, Memory
and WritableMemory
could be made interfaces, because interfaces support static methods in Java 8. If you want. However should be checked that performance is the same.
@leventov Performance degrades quite a bit with interfaces, unfortunately. Now that I have it working as abstract hierarchies, I'm not sure I want to change it.
Buffer, etc. is now working and with unit test coverage at 96%.
Netty found the same issue with interfaces: http://netty.io/wiki/new-and-noteworthy-in-4.0.html#bytebuf-is-not-an-interface-but-an-abstract-class
@cheddar @leerho @niketh as far as I can reason from public sources, this project is under development. According to https://github.com/druid-io/druid/issues/3892#issuecomment-276548114, could the query processing part be migrated first, and then the serde part? The processing part blocks #4422.
@b-slim
The facts are as follows:
The current internal implementation of Memory is heavily dependent on the Unsafe
class, as many high-performance libraries do. However, the architecture of Memory has been designed so that a non-Unsafe implementation could be created without impacting the API.
Druid has not moved to JDK 9 yet, nor have many of the other systems that currently use the library. So there hasn't been a great deal of pressure to move to JDK 9, yet. Nonetheless, when the time comes we will move to 9, 10, 11 or whatever.
My comment in the memory docs that you highlighted is simply the truth. We have not had the time, resources or the requirement to move to JDK 9 or 10. So it obviously hasn't been tested to work against JDK 9 or 10 either. I'm clearly not going to guarantee code that hasn't been tested. And I don't plan to start extensive testing until it becomes a requirement.
There have been only 2 people heavily involved in the design and implementation of the Memory repository in DataSketches, @leventov and myself. And both of us are very busy people.
If you understand the value in the Memory API as @leventov and I do, then how about contributing a helping hand?
how about contributing a helping hand?
@leerho it will be a great honor and learning experience to help. Am wondering by help you are referring to move Memory to be jdk9 compatible or refactor Druid code base to start using The Memory lib?
You could help by start doing some testing w/ Memory:
1) Do some testing w/ JDK9. What are the blockers? My understanding is that JDK9 allows access to Unsafe, but we have to add some code to access it. How and where do changes need to be made? My concern is that we will have to have a special code base for JDK9 that cannot be used with JDK8. Please investigate. You will need to create your own jars from master as the latest code has not been released to Maven Central yet. Although I hope it will be soon. 2) What about JDK10? Same questions. 3) Once you have some answers as to where it breaks and what we need to do, we can strategize on the best way to go forward. Don't submit any PR's, it is too early. If you want to show us code we can look at code on your own repo.
As a longer range contribution, you could investigate VarHandles and MethodHandles. Will they help at all? I'm not convinced from what I have read, but I have not played with them yet. If they look promising, you could do some detailed timing characterization and find out how they perform.
Then there is the OpenJDK Panama Project and JEP 191. These have been on the sideline for years, but if it they were ever adopted it would make life so much simpler for us. Do some digging and find out where they are headed and when! Contact John Rose and Charles Nutter... ask them!
You could become our migration expert !! :)
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.
This issue is no longer marked as stale.
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
Let's keep this open, it's still interesting and relevant. Some recent work includes #9308 and #9314. IMO, as a next step, it'd be interesting to look at switching VectorAggregators and their callers to a Memory-based API.
This issue is no longer marked as stale.
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.
The goal of this issue is to agree on basic design principles and expectations from this refactoring.
Goals
ByteBuffer
always performs when making any read or write. Bound checks could be hoisted outside the hot loops manually or enabled via assertions/static final boolean flag, that is effectively free with some precautions.long
indexes. See #3743.Memory
object has to be "sliced", "converted to read-only", etc. See https://github.com/druid-io/druid/pull/3716#issuecomment-274658045.Design
It is moved to Druid source tree, not used as a dependency.See https://github.com/druid-io/druid/issues/3892#issuecomment-276589924Memory
object is immutable. "position and limit", if needed, are passed along withMemory
as two primitive longs.Memory
object has a cached immutable "view" object (which implements read methods and throws exception from write methods), this object is always returned when "as read only" is needed.close()
orfree()
onMemory
is possible, but not strictly required, there is asun.misc.Cleaner
safety net, as well as inDirectByteBuffer
.ByteBuffer
toMemory
is in progress (not expected to be done all at once, but subsystem by subsystem, class by classsee https://github.com/druid-io/druid/issues/3892#issuecomment-276548114), and also when we need to interop with external libraries which requireByteBuffer
, conversion fromMemory
toByteBuffer
and vice versa is possible. Likely it requires toDirectByteBuffer
-compatible format ofCleaner
insideMemory
DirectByteBuffer
constructors viaMagicAccessorImpl
.Memory
's bounds are checked optionally via assertions/guarded by static final boolean flag, "local" position and limit are check explicitly manually, with helper methods or versions of read and write methods of Memory, which accept read/write position and "local" limits. https://github.com/druid-io/druid/issues/3892#issuecomment-276185704Objections, additions, corrections, questions are welcome. @leerho @cheddar @weijietong @akashdw @himanshug @fjy @niketh