Closed mppf closed 3 years ago
If this ends up being user facing, I'd want it to indicate that it's for advanced users or something along those lines. Otherwise I don't have an issue with this
We've at times discussed having a put()/get() library for doing cross network operations, and I think this is something we'd like to do and have just never prioritized. Reading this issue makes me wonder what the relationship between a Move
library and a Put/Get
library would be. I.e., would they share an interface? or the module that declared them? Or be completely independent? If the answer to the first question is "no", what would the impact be of calling a move operation on variables that were on two different locales?
@bradcray - I think that they could probably be the same. Note that #15676 asks about whether or not records should be notified in some way when a record is moved across locales. If did want to do that notifying for Move
operations but not for put
/get
then they would need to be different.
Additionally one might expect a put
/get
library to have some more memcpy
like functionality, e.g. a function signature like proc networkMemCpy(fromLocale: locale, fromPtr: c_void_ptr, toLocale: locale, toPtr: c_void_ptr, numBytes: int)
. But, if common usages of put
/get
calls can work with the reference arguments or array variants (like moveInitializeArrayElements
described above) then it could just be additional low-level functionality that isn't a "move" exactly but that the library could provide.
Tagging @chapel-lang/perf-team and @npadmana based on the prior response because we sometimes talk about wanting a user-facing put/get-level communication interface.
I definitely would support exposing low-level communication operations to the user (with all the caveats).
I'm not sure I understand @mppf's comment about how memcpys map into the functionality described above, but would be happy as long as it's straightforward to do. Just to be concrete, the memcpy functionality we make significant use of in the distributed FFT code is here --
list
to bulk transfer when initializing (from another list or an array). Perhaps the functions required to do that could be added to the module proposed here.I'm having trouble envisioning moveToReturn
, would it end up being something like this?
proc moveToReturn(const ref x) {
pragma "no init"
var ret: x.type;
_move(ret, x);
return ret;
}
My bad, just saw the implementation in test/expressions/noinit/LowLevelMove.chpl
. Mine was close, but missing a few pragmas...
It seems like we're not providing an uninhibited move
function here. Is there a reason in particular? If we're not going to bother having the compiler safety check these operations I would propose also adding move
.
I've made an issue to discuss the possibility of exposing a low level memory buffer to users here: https://github.com/chapel-lang/chapel/issues/16797
Since this LowLevelMove
module also happens to define needsDeinit
and explicitDeinit
, would it make sense to name this module something like LowLevelTools
, just in case we decide to add such a buffer to the module?
It seems like we're not providing an uninhibited move function here. Is there a reason in particular? If we're not going to bother having the compiler safety check these operations I would propose also adding move.
I don't know what you mean. I would imagine that proc moveInitialize(ref lhs, in rhs)
does what you are describing.
Since this
LowLevelMove
module also happens to defineneedsDeinit
andexplicitDeinit
, would it make sense to name this module something likeLowLevelTools
, just in case we decide to add such a buffer to the module?
I agree with the direction here but I don't particularly like either module name (LowLevelMove or LowLevelTools).
How does the module name Unsafe
sound? If we later decide to introduce syntax such as the unsafe
block, we could activate it upon importing this module.
How does the module name Unsafe sound?
I'm not a fan. I think it should give a better indication of what's in the module, not what effects it might have.
We may end up having as many of three pieces in this module depending on how discussion in https://github.com/chapel-lang/chapel/issues/16797 plays out:
deinit
on a variableThe best name I could come up with for the module was LowLevelTools
, but others weren't a fan of that either.
I still feel like LowLevelMove
is not broad enough a name.
Maybe...
LowLevelManagement
LowLevelMemory
LifetimeManagement
LifetimeControl
MemoryControl
Some other brain-stormy ideas for names
LowLevelMovesAndDeinits
SupportForLowLevelBuffers
LowLevelRecordOperations
ExplicitRecordManagement
RecordMemoryManagement
ExplicitLifetimes
ExplicitVariableManagement
LowLevelVariables
Among the brainstorming ideas, I prefer the things that talk about:
And not so much things that talk about:
My first/main intuition has been MemoryManagement
or MemMgmt
or some mix of those long and short forms, where my main concern is that this will suggest the module defines things like alloc/realloc/free to people.
MemoryControl
is pretty close to that and feels reasonably accurate/succinct/clear
MemSemantics
or MemorySemantics
is another name that occurs to me.
MemoryValues
is another idea
So originally, I was thinking that a low-level buffer as requested in #16797 might also live in this module (if it gets added at all), which would make this module a one-stop-shop for "collection building tools".
However the low-level buffer need not live here at all (if we eventually decide to add it).
My first/main intuition has been MemoryManagement or MemMgmt or some mix of those long and short forms, where my main concern is that this will suggest the module defines things like alloc/realloc/free to people.
I am concerned about this as well. Given the options suggested so far, I prefer:
MemoryControl
(if we choose this, I feel a low-level buffer may feel misplaced this module? May not matter...)LifetimeControl
(ditto the above, even more so...)ManualLifetimeManagement
(Perhaps too wordy...)LifetimeManagement
(Perhaps not wordy enough!)My first/main intuition has been
MemoryManagement
orMemMgmt
or some mix of those long and short forms, where my main concern is that this will suggest the module defines things like alloc/realloc/free to people.
Supposing we did add a buffer type equivalent to _ddata
to the module, wouldn't we have functions to allocate, reallocate, and free it? Then MemoryManagement
seems more OK, doesn't it? Of course, maybe it is easier to separate the low-level buffer idea into a separate module named Buffer
or MemoryBuffer
or something. Even so, it seems reasonable to me to provide them together because using the low-level buffer requires these move and explicit deinit operations.
I think the main problem I see with the memory being the keyword is that owned/shared are also "memory management". That's not necessarily a show-stopper though (easy enough to note owned/shared exist and link to them at the top of the docs, e.g.). Anyway for some reason the value or variables keywords helps me avoid that.
I worry that lifetime will get people thinking about the lifetime checker or copy elision which are related but not exactly controlled by the module. E.g. LifetimeControl
makes me think of something that adjusts how the lifetime checker operates.
Anyway I find ValueMemoryManagement
appealing, if long. (But, since this is a low level module, is it so bad if it's long?). Maybe ValueMemory
, ValueControl
, or ValueManagement
can be shorter alternatives?
I particularly like ValueMemoryManagement
. I think this helps us distinguish it from MemoryManagement
.
import ValueMemoryManagement as vmm;
var a, b = 0;
vmm.moveSwap(a, b);
var buf = new vmm.buffer(int, 10);
// ...
I think the main problem I see with the memory being the keyword is that owned/shared are also "memory management".
We could think of these as being "class management" or "object management" or "object lifetime" policies rather than memory management per se. For all other than unmanaged
, I don't think about memory in the alloc/realloc/free sense when I think of them. More about creation and destruction (ditto a typical stack-allocated record).
Alright so far for names, in my head there are five top contenders:
MemoryManagement
(Still a bit contentious)ValueMemoryManagement
(Wordier, but less confusing)ValueManagement
(Perhaps too general?)ValueControl
MemoryControl
I'd like to whittle things down to 3 and then I could put the names to an initial vote.
I have some concerns about the name moveToReturn
. The names of the other move functions in this module flow very well to me, but I always find myself stumbling over moveToReturn
. I always imagine that it would only be called when you want to consume and return a reference:
proc myList.pop() {
if this._size != 0 then return moveToReturn(this.last);
}
But in reality moveToReturn
is actually more general than that. It just consumes the value pointed to by a reference and stores it into a new value.
You don't actually need to return
that value at all:
proc myList.dropLast() {
if this._size then moveToReturn(this.last);
}
In my head the names:
moveExhaust
moveConsume
Seem to better describe the pattern that is going on here.
My 2 cents: I prefer names with Memory
in them to those with Value
. I have no intuition what ValueMemoryManagement
could mean different than MemoryManagement
.
And an out-of-the-box idea: I'd move/rename the existing Memory
module and make the module we discuss here the Memory
module. I think that also fits better with import
-based world. Memory.moveSwap
is pretty concise and clear to me.
Re. moveToReturn
I agree that its current name describes the contents of the function rather than how to use it. What about moveToValue
?
And an out-of-the-box idea: I'd move/rename the existing
Memory
module and make the module we discuss here theMemory
module. I think that also fits better withimport
-based world.Memory.moveSwap
is pretty concise and clear to me.
Some of what we have in Memory
should probably be in MemoryDiagnostics
but things like memoryUsed
would make sense to be in a Memory
module with one name or another. Would you store memoryUsed
in a 3rd module (e.g. AvailableMemory
) or in the module with the low-level moves?
These concerns make me less willing to have Memory
as the name for the module discussed in this issue.
What about moveToValue ?
I like the name moveToValue
as well. Thanks!
Some of what we have in Memory should probably be in MemoryDiagnostics but things like memoryUsed would make sense to be in a Memory module with one name or another. Would you store memoryUsed in a 3rd module (e.g. AvailableMemory) or in the module with the low-level moves?
I would argue that everything in the current Memory
module is a form of diagnostic, and can be moved to a MemoryDiagnostics
module verbatim. That includes memoryUsed
, too. Maybe an alternative name is MemoryIntrospection
, but I am getting too off topic.
Should explicitDeinit
remove FLAG_AUTO_DESTROY from arguments that are variables? Else you could get into a situation where a variable is destroyed twice...
record r {
proc deinit() { writeln('deinit r'); }
}
{
import LowLevelMove;
var x = new r();
LowLevelMove.explicitDeinit(x);
}
// I would expect the following to be printed, is this what we want?
// deinit r
// deinit r
Currently explicitDeinit
is only useful when called on elements within unmanaged buffers. This is fine, but I don't think it can possibly be useful in other situations if it leads to this sort of double destroy pattern...
Another iteration on names...
ValueMemoryManagement
MemoryManagement
Memory
The problem with ValueMemoryManagement
is that it's long, and also hard to grasp how it's any different than MemoryManagement
. For some, though, the addition of Value
helps separate it from MemoryManagement
.
The problem with MemoryManagement
is that it might evoke visions of malloc
/realloc
/free
or class management flavors such as shared
, etc...
The problem with Memory
is we already have a module with the same name, but if agreed we could move it to a module such as MemoryDiagnostics
. We're also not certain if we want to move everything in Memory
to MemoryDiagnostics
.
One thing is clear, we seem like we want the word Memory
in the module name in some capacity.
Personally, I like the name ManualMemory
, because it flows very well in my head.
E.g...
import ManualMemory;
// Create a _manual memory_ buffer...
var buf = new ManualMemory.buffer(int, 10);
// Use a _manual memory_ move...
for i in 0..<buf.size do ManualMemory.moveInitialize(buf[i], i);
// Do a _manual_ deinit...
for x in buf do ManualMemory.explicitDeinit(x);
I like the addition of the word Manual
because it implies that these are low level operations that are normally done for you, but by importing ManualMemory
you are opting into doing them yourself.
I think when we write the name MemoryManagement
what we are really trying to imply is "manual control", thus ManualMemory
.
Should explicitDeinit remove FLAG_AUTO_DESTROY from arguments that are variables? Else you could get into a situation where a variable is destroyed twice...
Actually doing so is very similar to how the compiler handles a copy elided in
. But, we might need different functions for "deinit my local variable" and "deinit my buffer element" because of the way the 1st is a value but the 2nd is a ref.
The problem with MemoryManagement is that it might evoke visions of malloc/realloc/free
This does not bother me. We could keep these functions in such a module.
or class management flavors such as shared, etc...
A problem that we can manage with documentation links if necessary.
Personally, I like the name ManualMemory, because it flows very well in my head.
Sure. We could also consider ManualMemoryManagement
.
I am OK with all of ValueMemoryManagement
/ MemoryManagement
/ Memory
/ ManualMemory
/ ManualMemoryManagement
but it would depend on what else we would put in the module. For MemoryManagement
and Memory
I would want things like malloc/free and for Memory
I would additionally want the functions returning the amount of available memory / total system memory. Especially for those functions I'd add to Memory
, whether or not that is OK depends on whether or not we view use Memory
as indicating low-level and potentially unsafe code. Vs. if the module is limited to the functions described in this issue, use
ing the module indicates use of low-level and potentially unsafe features. Of course, this concern about marking unsafety is moot if we go for making certain functions only callable within unsafe
blocks.
It looks like this ship may have sailed, or at least gotten too far from the dock for me to jump across to it. But I'll try anyway. Throw me a line if you see bubbles, would you?
To my mind "memory" as used in for example "memory management" and "memory allocation" has to do with the system resource with that name. Here it seems we're talking about something else: the action of moving and copying data. So I'd call it something like DataMovement
. It could hold any sort of data moving or copying methods/procs we need, for both local and remote data. It might even be possible to subsume some of the activities done at the top surface of the comm layer interface into this, for example checking that the source node of a GET or the target node of a PUT is the current node, so that we can do a local transfer instead of a network one. Brave user programmers, or more likely domain and/or domain map and/or locale model programmers, could use it to do their data movement. Perhaps the compiler could do so as well, if that simplified how it emits code, or what code it emits.
DataMovement
would be an OK module name for the move operations but it doesn't really hint at the relationship with record init/copy init/deinit. Somehow ValueMovement
addresses that issue for me though (because I can imagine a "value" as being something that needs to be deinitialized - while I don't feel that "data" ever needs to be deinitialized since my view of "data" is just a bunch of bytes).
It sounds as though the module as currently envisioned is doing two distinct things. Would it be appropriate to split it into two modules, one for data movement and one for record init/copy init/deinit, with the latter using the former?
The issue is that what I would mean by the "data movement" is actually a lack of record copy init (or assignment). I.e. it is a lower level override to the normal behavior of running =
or init=
overloads for the record. In this way they are related. One won't be able to just "move the bits" for a record unless one pays attention to the other elements (ensuring the record is deinited exactly once - probably by disabling one variable's deinit and explicitly deiniting it later).
Maybe the module should be called DataValues
?
a function to swap two references using low-level moves
My reaction is that this should be the default built-in semantics for <=>
. If it does not work for some datatype, then <=>
would be defined to do something else. In that case yes, an explicit "do low-level moves" function can be beneficial. Maybe call it <==>
?
explicitDeinit() [...] only makes sense to call when its argument is a buffer slot (e.g. an array element). It should not be called on a variable, because the compiler is not aware of the call and will deinitialize the variable again.
I think array elements have the same issue as variables, as the compiler will deinitialize them if not told otherwise.
I suggest that the compiler BE aware of explicitDeinit(), avoiding the introduction of a new function. I do not think the compiler-unaware explicitDeinit() would be practical.
How about a single Memory
module with submodules for the various aspects, ex.
use Memory.Diagnostics; // most or all of today's Memory module
use Memory.ExplicitInitDeinit; // the module in this issue
If so, ExplicitControl
or ExplicitManagement
fit well. I would be OK also with ExplicitLifetimes
, although it is indeed a bit vague.
Or, how about MoveRawBits
?
I think it is OK if this module's name is a bit lengthy. Users are already walking a thin line if they are using it, a long name would be their least concern. It would also be a deterrent, akin to unmanaged
being longer than owned
, where we want users to use the latter over the former.
I like the submodules-of-Memory idea.
a function to swap two references using low-level moves
My reaction is that this should be the default built-in semantics for
<=>
.
If I'm understanding the required feature and your response, I disagree with this. Once a ref
is established in Chapel (at its initialization time), subsequent uses of it refer to the thing being referred to, not the reference itself. For example:
var a, b: int;
ref ra = a,
rb = b;
ra = b;
doesn't cause ra
to refer to b
, it assigns b
to a
.
In addition <=>
in Chapel is intended to be a bi-directional assignment, so ra <=> rb
shouldn't cause ra
to refer to b
and rb
to a
, but to assign a
to b
and vice-versa.
So re-interpreting <=>
on references to mean that the references themselves should be swapped would change one or both of these principles. I don't think we should change either one on this basis (again, if I haven't misunderstood the feature request).
The moveSwap
function could be viewed as a version of the swap operator that works without performing assignment, e.g.
proc moveSwap(ref a: ?t1, ref b: t1) {
pragma "no init"
var tmp: t1;
_move(tmp, a);
_move(a, b);
_move(b, tmp);
}
So w.r.t. to @vasslitvinov...
My reaction is that this should be the default built-in semantics for <=>. If it does not work for some datatype, then <=> would be defined to do something else.
If we want/expect the <=>
operator to perform assignment when swapping then <=>
would only be semantically equivalent to moveSwap
for POD types. It could have unexpected results otherwise, which is why I think we want to use a named function here instead of an operator.
In that case yes, an explicit "do low-level moves" function can be beneficial. Maybe call it <==> ?
I worry about adding an extra operator that is so specialized. The only time I have ever wanted to moveSwap
something was when writing a collection method - in other Chapel code I would not shy away from assignment.
If we want/expect the <=> operator to perform assignment when swapping then <=> would only be semantically equivalent to moveSwap for POD types. It could have unexpected results otherwise, which is why I think we want to use a named function here instead of an operator.
So why do we want the <=>
operator to perform assignment? In what cases would the user not want to just swap the bits? What if we make that case special instead of the default?
I can see <=>
between two arrays or two user-defined collections to swap the whole thing by default, and having a separate function for element-wise swaps.
My concern about the assignment-based interpretation of <=>
is that it is more involved than just assignments. (a) It also performs an init and a deinit. (b) We need to specify whether the temp is initialized from the LHS or from the RHS.
All this is relevant only for non-POD types, of course.
re-interpreting <=> on references to mean that the references themselves should be swapped would change one or both of these principles
Actually I was not proposing that. I assumed that if the LHS and/or RHS is a ref, then the thing(s) being referenced are operated on.
I also really like the submodules-of-Memory idea, most of all because it allows us to neatly separate concerns.
I could totally see something like:
use Memory.Diagnostics;
use Memory.LowLevelMove;
use Memory.ExplicitDeinit;
And later if we decide to add a low level buffer this organization will easily let us do so:
use Memory.LowLevelBuffer;
Maybe ExplicitMove
instead of LowLevelMove
?
Also, does it work to use
submodule(s) without using the parent module first?
Actually I was not proposing that.
The rest of this is potentially off-topic then, but since you asked:
So why do we want the <=> operator to perform assignment?
Because that's how we defined it, as a means to write a <=> b
instead of having to write the pattern:
var tmp = b;
b = a;
a = tmp;
In what cases would the user not want to just swap the bits?
var A: [1..n] real;
var B = newBlockArr({1..n}, real);
A <=> B;
var b8: bool(8);
var b: bool;
b8 <=> b;
(a) It also performs an init and a deinit.
It need not. If you define <=>
for your record type, you could swap the contents directly rather than relying on the default implementation.
Also, does it work to use submodule(s) without using the parent module first?
You can't say use submodule;
but you can say use parentmodule.submodule;
: https://tio.run/##S85ILEjN@f8/Nz@lNCdVwVehmktBAcrxA3MUFAqK8pMV0vLzNTShAgoK5UWZJak5eRpKnnkgGSVNa7BELRcI13JxlRYDzdLzs@YCa7P@/x8A
I am OK leaving the default behavior of <=>
as the three statements. However it looks like an overkill when the lhs and rhs types are identical. I would be game specializing the default <=>
specification for that case.
Are there any disadvantages to using submodules that would cause someone to be against it? I'm having trouble thinking of a disadvantage other than a more complicated organization.
Assuming we're all on board with the idea, the next step is to decide on what we want to call each submodule. I'll begin the effort...
use Memory.Diagnostics; // The contents of the existing `Memory` module
use Memory.Move; // Newly proposed move functions
use Memory.ExplicitDeinit; // Newly proposed `explicitDeinit()` function
How do people feel about these submodule names?
I'm definitely open to the submodule-based organization in general. I'm less certain about these specific names and organizations. Specifically:
While it's true that most collection authors will probably end up using Move
and Deinit
together, I think that as long as we're going to go through the effort of using submodules, it's important to make sure each submodule contains a distinct concept.
I struggle to think of a name that doesn't feel like we're just cramming moves and deinit together for the sake of ergonomics. They still feel like separate concepts to me. We could be upfront about it and use a name like MoveAndDeinit
, though.
W.r.t. to the module name ExplicitDeinit
, I agree a single word would be better for symmetry. Is the module name Initialization
any better? Though it only contains the explicitDeinit
function for now, the module could contain hooks to call other initializers in the future.
I doubt the virtue of putting moves and deinits in separate modules, other than in simplifying our search of good module names.
Memory.ExplicitManagement
stands out for me as a good name for everything related to "low-level" operations, possibly including malloc+free in the future.
This issue proposes adding a user-facing module to support low-level move operations. These can be sometimes useful when constructing data structures, for example, when moving from an
in
intent argument into a memory location; or when moving out of a memory location in the process of consuming an element.These operations are currently implemented in the Sort module (in the ShallowCopy submodule), the List module, and also in the array implementation.
test/test/expressions/noinit/LowLevelMove.chpl has a prototype for this that is used in some noinit tests.
Now here is a list of functions that the module should have:
proc needsDeinit(type t) param
proc explicitDeinit(ref r)
proc moveInitialize(ref lhs, in rhs)
proc moveToReturn(const ref arg)
proc moveSwap(ref lhs, ref rhs)
proc moveInitializeArrayElements(ref A, dstStartIndex, srcStartIndex, numElements)
proc moveInitializeArrayElements(ref DstA, dstStartIndex, ref SrcA, srcStartIndex, numElements)