chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

module providing low-level move operations #16172

Closed mppf closed 3 years ago

mppf commented 4 years ago

This issue proposes adding a user-facing module to support low-level move operations. These can be sometimes useful when constructing data structures, for example, when moving from an in intent argument into a memory location; or when moving out of a memory location in the process of consuming an element.

These operations are currently implemented in the Sort module (in the ShallowCopy submodule), the List module, and also in the array implementation.

test/test/expressions/noinit/LowLevelMove.chpl has a prototype for this that is used in some noinit tests.

Now here is a list of functions that the module should have:

lydia-duncan commented 4 years ago

If this ends up being user facing, I'd want it to indicate that it's for advanced users or something along those lines. Otherwise I don't have an issue with this

bradcray commented 4 years ago

We've at times discussed having a put()/get() library for doing cross network operations, and I think this is something we'd like to do and have just never prioritized. Reading this issue makes me wonder what the relationship between a Move library and a Put/Get library would be. I.e., would they share an interface? or the module that declared them? Or be completely independent? If the answer to the first question is "no", what would the impact be of calling a move operation on variables that were on two different locales?

mppf commented 4 years ago

@bradcray - I think that they could probably be the same. Note that #15676 asks about whether or not records should be notified in some way when a record is moved across locales. If did want to do that notifying for Move operations but not for put/get then they would need to be different.

Additionally one might expect a put/get library to have some more memcpy like functionality, e.g. a function signature like proc networkMemCpy(fromLocale: locale, fromPtr: c_void_ptr, toLocale: locale, toPtr: c_void_ptr, numBytes: int). But, if common usages of put/get calls can work with the reference arguments or array variants (like moveInitializeArrayElements described above) then it could just be additional low-level functionality that isn't a "move" exactly but that the library could provide.

bradcray commented 4 years ago

Tagging @chapel-lang/perf-team and @npadmana based on the prior response because we sometimes talk about wanting a user-facing put/get-level communication interface.

npadmana commented 4 years ago

I definitely would support exposing low-level communication operations to the user (with all the caveats).

I'm not sure I understand @mppf's comment about how memcpys map into the functionality described above, but would be happy as long as it's straightforward to do. Just to be concrete, the memcpy functionality we make significant use of in the distributed FFT code is here --

https://github.com/npadmana/DistributedFFT/blob/4f8aaac22cce6b059100734a98f0691e451cc374/src/DistributedFFT.chpl#L371

mppf commented 3 years ago

13583 requests the ability for list to bulk transfer when initializing (from another list or an array). Perhaps the functions required to do that could be added to the module proposed here.

dlongnecke-cray commented 3 years ago

I'm having trouble envisioning moveToReturn, would it end up being something like this?

proc moveToReturn(const ref x) {
  pragma "no init"
  var ret: x.type;
  _move(ret, x);
  return ret;
}

My bad, just saw the implementation in test/expressions/noinit/LowLevelMove.chpl. Mine was close, but missing a few pragmas...

dlongnecke-cray commented 3 years ago

It seems like we're not providing an uninhibited move function here. Is there a reason in particular? If we're not going to bother having the compiler safety check these operations I would propose also adding move.

dlongnecke-cray commented 3 years ago

I've made an issue to discuss the possibility of exposing a low level memory buffer to users here: https://github.com/chapel-lang/chapel/issues/16797

Since this LowLevelMove module also happens to define needsDeinit and explicitDeinit, would it make sense to name this module something like LowLevelTools, just in case we decide to add such a buffer to the module?

mppf commented 3 years ago

It seems like we're not providing an uninhibited move function here. Is there a reason in particular? If we're not going to bother having the compiler safety check these operations I would propose also adding move.

I don't know what you mean. I would imagine that proc moveInitialize(ref lhs, in rhs) does what you are describing.

Since this LowLevelMove module also happens to define needsDeinit and explicitDeinit, would it make sense to name this module something like LowLevelTools, just in case we decide to add such a buffer to the module?

I agree with the direction here but I don't particularly like either module name (LowLevelMove or LowLevelTools).

dlongnecke-cray commented 3 years ago

How does the module name Unsafe sound? If we later decide to introduce syntax such as the unsafe block, we could activate it upon importing this module.

bradcray commented 3 years ago

How does the module name Unsafe sound?

I'm not a fan. I think it should give a better indication of what's in the module, not what effects it might have.

dlongnecke-cray commented 3 years ago

We may end up having as many of three pieces in this module depending on how discussion in https://github.com/chapel-lang/chapel/issues/16797 plays out:

The best name I could come up with for the module was LowLevelTools, but others weren't a fan of that either.

I still feel like LowLevelMove is not broad enough a name.

Maybe...

mppf commented 3 years ago

Some other brain-stormy ideas for names

LowLevelMovesAndDeinits

SupportForLowLevelBuffers

LowLevelRecordOperations

ExplicitRecordManagement

RecordMemoryManagement

ExplicitLifetimes

ExplicitVariableManagement

LowLevelVariables

bradcray commented 3 years ago

Among the brainstorming ideas, I prefer the things that talk about:

And not so much things that talk about:

My first/main intuition has been MemoryManagement or MemMgmt or some mix of those long and short forms, where my main concern is that this will suggest the module defines things like alloc/realloc/free to people.

MemoryControl is pretty close to that and feels reasonably accurate/succinct/clear

MemSemantics or MemorySemantics is another name that occurs to me.

MemoryValues is another idea

dlongnecke-cray commented 3 years ago

So originally, I was thinking that a low-level buffer as requested in #16797 might also live in this module (if it gets added at all), which would make this module a one-stop-shop for "collection building tools".

However the low-level buffer need not live here at all (if we eventually decide to add it).

My first/main intuition has been MemoryManagement or MemMgmt or some mix of those long and short forms, where my main concern is that this will suggest the module defines things like alloc/realloc/free to people.

I am concerned about this as well. Given the options suggested so far, I prefer:

mppf commented 3 years ago

My first/main intuition has been MemoryManagement or MemMgmt or some mix of those long and short forms, where my main concern is that this will suggest the module defines things like alloc/realloc/free to people.

Supposing we did add a buffer type equivalent to _ddata to the module, wouldn't we have functions to allocate, reallocate, and free it? Then MemoryManagement seems more OK, doesn't it? Of course, maybe it is easier to separate the low-level buffer idea into a separate module named Buffer or MemoryBuffer or something. Even so, it seems reasonable to me to provide them together because using the low-level buffer requires these move and explicit deinit operations.

I think the main problem I see with the memory being the keyword is that owned/shared are also "memory management". That's not necessarily a show-stopper though (easy enough to note owned/shared exist and link to them at the top of the docs, e.g.). Anyway for some reason the value or variables keywords helps me avoid that.

I worry that lifetime will get people thinking about the lifetime checker or copy elision which are related but not exactly controlled by the module. E.g. LifetimeControl makes me think of something that adjusts how the lifetime checker operates.

Anyway I find ValueMemoryManagement appealing, if long. (But, since this is a low level module, is it so bad if it's long?). Maybe ValueMemory, ValueControl, or ValueManagement can be shorter alternatives?

dlongnecke-cray commented 3 years ago

I particularly like ValueMemoryManagement. I think this helps us distinguish it from MemoryManagement.

import ValueMemoryManagement as vmm;

var a, b = 0;
vmm.moveSwap(a, b);

var buf = new vmm.buffer(int, 10);
// ...
bradcray commented 3 years ago

I think the main problem I see with the memory being the keyword is that owned/shared are also "memory management".

We could think of these as being "class management" or "object management" or "object lifetime" policies rather than memory management per se. For all other than unmanaged, I don't think about memory in the alloc/realloc/free sense when I think of them. More about creation and destruction (ditto a typical stack-allocated record).

dlongnecke-cray commented 3 years ago

Alright so far for names, in my head there are five top contenders:

I'd like to whittle things down to 3 and then I could put the names to an initial vote.

dlongnecke-cray commented 3 years ago

I have some concerns about the name moveToReturn. The names of the other move functions in this module flow very well to me, but I always find myself stumbling over moveToReturn. I always imagine that it would only be called when you want to consume and return a reference:

proc myList.pop() {
  if this._size != 0 then return moveToReturn(this.last);
}

But in reality moveToReturn is actually more general than that. It just consumes the value pointed to by a reference and stores it into a new value.

You don't actually need to return that value at all:

proc myList.dropLast() {
  if this._size then moveToReturn(this.last);
}

In my head the names:

Seem to better describe the pattern that is going on here.

e-kayrakli commented 3 years ago

My 2 cents: I prefer names with Memory in them to those with Value. I have no intuition what ValueMemoryManagement could mean different than MemoryManagement.

And an out-of-the-box idea: I'd move/rename the existing Memory module and make the module we discuss here the Memory module. I think that also fits better with import-based world. Memory.moveSwap is pretty concise and clear to me.

mppf commented 3 years ago

Re. moveToReturn I agree that its current name describes the contents of the function rather than how to use it. What about moveToValue ?

mppf commented 3 years ago

And an out-of-the-box idea: I'd move/rename the existing Memory module and make the module we discuss here the Memory module. I think that also fits better with import-based world. Memory.moveSwap is pretty concise and clear to me.

Some of what we have in Memory should probably be in MemoryDiagnostics but things like memoryUsed would make sense to be in a Memory module with one name or another. Would you store memoryUsed in a 3rd module (e.g. AvailableMemory) or in the module with the low-level moves?

These concerns make me less willing to have Memory as the name for the module discussed in this issue.

dlongnecke-cray commented 3 years ago

What about moveToValue ?

I like the name moveToValue as well. Thanks!

e-kayrakli commented 3 years ago

Some of what we have in Memory should probably be in MemoryDiagnostics but things like memoryUsed would make sense to be in a Memory module with one name or another. Would you store memoryUsed in a 3rd module (e.g. AvailableMemory) or in the module with the low-level moves?

I would argue that everything in the current Memory module is a form of diagnostic, and can be moved to a MemoryDiagnostics module verbatim. That includes memoryUsed, too. Maybe an alternative name is MemoryIntrospection, but I am getting too off topic.

dlongnecke-cray commented 3 years ago

Should explicitDeinit remove FLAG_AUTO_DESTROY from arguments that are variables? Else you could get into a situation where a variable is destroyed twice...

record r {
  proc deinit() { writeln('deinit r'); }
}

{
  import LowLevelMove;
  var x = new r();
  LowLevelMove.explicitDeinit(x);
}

// I would expect the following to be printed, is this what we want?
// deinit r
// deinit r

Currently explicitDeinit is only useful when called on elements within unmanaged buffers. This is fine, but I don't think it can possibly be useful in other situations if it leads to this sort of double destroy pattern...

dlongnecke-cray commented 3 years ago

Another iteration on names...

The problem with ValueMemoryManagement is that it's long, and also hard to grasp how it's any different than MemoryManagement. For some, though, the addition of Value helps separate it from MemoryManagement.

The problem with MemoryManagement is that it might evoke visions of malloc/realloc/free or class management flavors such as shared, etc...

The problem with Memory is we already have a module with the same name, but if agreed we could move it to a module such as MemoryDiagnostics. We're also not certain if we want to move everything in Memory to MemoryDiagnostics.


One thing is clear, we seem like we want the word Memory in the module name in some capacity.

Personally, I like the name ManualMemory, because it flows very well in my head.

E.g...

import ManualMemory;

// Create a _manual memory_ buffer...
var buf = new ManualMemory.buffer(int, 10);

// Use a _manual memory_ move...
for i in 0..<buf.size do ManualMemory.moveInitialize(buf[i], i);

// Do a _manual_ deinit...
for x in buf do ManualMemory.explicitDeinit(x);

I like the addition of the word Manual because it implies that these are low level operations that are normally done for you, but by importing ManualMemory you are opting into doing them yourself.

I think when we write the name MemoryManagement what we are really trying to imply is "manual control", thus ManualMemory.

mppf commented 3 years ago

Should explicitDeinit remove FLAG_AUTO_DESTROY from arguments that are variables? Else you could get into a situation where a variable is destroyed twice...

Actually doing so is very similar to how the compiler handles a copy elided in. But, we might need different functions for "deinit my local variable" and "deinit my buffer element" because of the way the 1st is a value but the 2nd is a ref.

mppf commented 3 years ago

The problem with MemoryManagement is that it might evoke visions of malloc/realloc/free

This does not bother me. We could keep these functions in such a module.

or class management flavors such as shared, etc...

A problem that we can manage with documentation links if necessary.

Personally, I like the name ManualMemory, because it flows very well in my head.

Sure. We could also consider ManualMemoryManagement.

I am OK with all of ValueMemoryManagement / MemoryManagement / Memory / ManualMemory / ManualMemoryManagement but it would depend on what else we would put in the module. For MemoryManagement and Memory I would want things like malloc/free and for Memory I would additionally want the functions returning the amount of available memory / total system memory. Especially for those functions I'd add to Memory, whether or not that is OK depends on whether or not we view use Memory as indicating low-level and potentially unsafe code. Vs. if the module is limited to the functions described in this issue, useing the module indicates use of low-level and potentially unsafe features. Of course, this concern about marking unsafety is moot if we go for making certain functions only callable within unsafe blocks.

gbtitus commented 3 years ago

It looks like this ship may have sailed, or at least gotten too far from the dock for me to jump across to it. But I'll try anyway. Throw me a line if you see bubbles, would you?

To my mind "memory" as used in for example "memory management" and "memory allocation" has to do with the system resource with that name. Here it seems we're talking about something else: the action of moving and copying data. So I'd call it something like DataMovement. It could hold any sort of data moving or copying methods/procs we need, for both local and remote data. It might even be possible to subsume some of the activities done at the top surface of the comm layer interface into this, for example checking that the source node of a GET or the target node of a PUT is the current node, so that we can do a local transfer instead of a network one. Brave user programmers, or more likely domain and/or domain map and/or locale model programmers, could use it to do their data movement. Perhaps the compiler could do so as well, if that simplified how it emits code, or what code it emits.

mppf commented 3 years ago

DataMovement would be an OK module name for the move operations but it doesn't really hint at the relationship with record init/copy init/deinit. Somehow ValueMovement addresses that issue for me though (because I can imagine a "value" as being something that needs to be deinitialized - while I don't feel that "data" ever needs to be deinitialized since my view of "data" is just a bunch of bytes).

gbtitus commented 3 years ago

It sounds as though the module as currently envisioned is doing two distinct things. Would it be appropriate to split it into two modules, one for data movement and one for record init/copy init/deinit, with the latter using the former?

mppf commented 3 years ago

The issue is that what I would mean by the "data movement" is actually a lack of record copy init (or assignment). I.e. it is a lower level override to the normal behavior of running = or init= overloads for the record. In this way they are related. One won't be able to just "move the bits" for a record unless one pays attention to the other elements (ensuring the record is deinited exactly once - probably by disabling one variable's deinit and explicitly deiniting it later).

bradcray commented 3 years ago

Maybe the module should be called DataValues?

vasslitvinov commented 3 years ago

a function to swap two references using low-level moves

My reaction is that this should be the default built-in semantics for <=>. If it does not work for some datatype, then <=> would be defined to do something else. In that case yes, an explicit "do low-level moves" function can be beneficial. Maybe call it <==> ?

explicitDeinit() [...] only makes sense to call when its argument is a buffer slot (e.g. an array element). It should not be called on a variable, because the compiler is not aware of the call and will deinitialize the variable again.

I think array elements have the same issue as variables, as the compiler will deinitialize them if not told otherwise.

I suggest that the compiler BE aware of explicitDeinit(), avoiding the introduction of a new function. I do not think the compiler-unaware explicitDeinit() would be practical.

vasslitvinov commented 3 years ago

How about a single Memory module with submodules for the various aspects, ex.

use Memory.Diagnostics;         // most or all of today's Memory module
use Memory.ExplicitInitDeinit;  // the module in this issue

If so, ExplicitControl or ExplicitManagement fit well. I would be OK also with ExplicitLifetimes, although it is indeed a bit vague.

Or, how about MoveRawBits ?

I think it is OK if this module's name is a bit lengthy. Users are already walking a thin line if they are using it, a long name would be their least concern. It would also be a deterrent, akin to unmanaged being longer than owned, where we want users to use the latter over the former.

mppf commented 3 years ago

I like the submodules-of-Memory idea.

bradcray commented 3 years ago

a function to swap two references using low-level moves

My reaction is that this should be the default built-in semantics for <=>.

If I'm understanding the required feature and your response, I disagree with this. Once a ref is established in Chapel (at its initialization time), subsequent uses of it refer to the thing being referred to, not the reference itself. For example:

var a, b: int;
ref ra = a,
      rb = b;

ra = b;

doesn't cause ra to refer to b, it assigns b to a.

In addition <=> in Chapel is intended to be a bi-directional assignment, so ra <=> rb shouldn't cause ra to refer to b and rb to a, but to assign a to b and vice-versa.

So re-interpreting <=> on references to mean that the references themselves should be swapped would change one or both of these principles. I don't think we should change either one on this basis (again, if I haven't misunderstood the feature request).

dlongnecke-cray commented 3 years ago

The moveSwap function could be viewed as a version of the swap operator that works without performing assignment, e.g.

proc moveSwap(ref a: ?t1, ref b: t1) {
  pragma "no init"
  var tmp: t1;
  _move(tmp, a);
  _move(a, b);
  _move(b, tmp);
}

So w.r.t. to @vasslitvinov...

My reaction is that this should be the default built-in semantics for <=>. If it does not work for some datatype, then <=> would be defined to do something else.

If we want/expect the <=> operator to perform assignment when swapping then <=> would only be semantically equivalent to moveSwap for POD types. It could have unexpected results otherwise, which is why I think we want to use a named function here instead of an operator.

In that case yes, an explicit "do low-level moves" function can be beneficial. Maybe call it <==> ?

I worry about adding an extra operator that is so specialized. The only time I have ever wanted to moveSwap something was when writing a collection method - in other Chapel code I would not shy away from assignment.

vasslitvinov commented 3 years ago

If we want/expect the <=> operator to perform assignment when swapping then <=> would only be semantically equivalent to moveSwap for POD types. It could have unexpected results otherwise, which is why I think we want to use a named function here instead of an operator.

So why do we want the <=> operator to perform assignment? In what cases would the user not want to just swap the bits? What if we make that case special instead of the default?

I can see <=> between two arrays or two user-defined collections to swap the whole thing by default, and having a separate function for element-wise swaps.

My concern about the assignment-based interpretation of <=> is that it is more involved than just assignments. (a) It also performs an init and a deinit. (b) We need to specify whether the temp is initialized from the LHS or from the RHS.

All this is relevant only for non-POD types, of course.

re-interpreting <=> on references to mean that the references themselves should be swapped would change one or both of these principles

Actually I was not proposing that. I assumed that if the LHS and/or RHS is a ref, then the thing(s) being referenced are operated on.

dlongnecke-cray commented 3 years ago

I also really like the submodules-of-Memory idea, most of all because it allows us to neatly separate concerns.

I could totally see something like:

use Memory.Diagnostics;
use Memory.LowLevelMove;
use Memory.ExplicitDeinit;

And later if we decide to add a low level buffer this organization will easily let us do so:

use Memory.LowLevelBuffer;
vasslitvinov commented 3 years ago

Maybe ExplicitMove instead of LowLevelMove ?

Also, does it work to use submodule(s) without using the parent module first?

bradcray commented 3 years ago

Actually I was not proposing that.

The rest of this is potentially off-topic then, but since you asked:

So why do we want the <=> operator to perform assignment?

Because that's how we defined it, as a means to write a <=> b instead of having to write the pattern:

var tmp = b;
b = a;
a = tmp;

In what cases would the user not want to just swap the bits?

var A: [1..n] real;
var B = newBlockArr({1..n}, real);

A <=> B;

var b8: bool(8);
var b: bool;

b8 <=> b;

(a) It also performs an init and a deinit.

It need not. If you define <=> for your record type, you could swap the contents directly rather than relying on the default implementation.

bradcray commented 3 years ago

Also, does it work to use submodule(s) without using the parent module first?

You can't say use submodule; but you can say use parentmodule.submodule;: https://tio.run/##S85ILEjN@f8/Nz@lNCdVwVehmktBAcrxA3MUFAqK8pMV0vLzNTShAgoK5UWZJak5eRpKnnkgGSVNa7BELRcI13JxlRYDzdLzs@YCa7P@/x8A

vasslitvinov commented 3 years ago

I am OK leaving the default behavior of <=> as the three statements. However it looks like an overkill when the lhs and rhs types are identical. I would be game specializing the default <=> specification for that case.

dlongnecke-cray commented 3 years ago

Are there any disadvantages to using submodules that would cause someone to be against it? I'm having trouble thinking of a disadvantage other than a more complicated organization.

Assuming we're all on board with the idea, the next step is to decide on what we want to call each submodule. I'll begin the effort...

use Memory.Diagnostics; // The contents of the existing `Memory` module
use Memory.Move; // Newly proposed move functions
use Memory.ExplicitDeinit; // Newly proposed `explicitDeinit()` function

How do people feel about these submodule names?

bradcray commented 3 years ago

I'm definitely open to the submodule-based organization in general. I'm less certain about these specific names and organizations. Specifically:

dlongnecke-cray commented 3 years ago

While it's true that most collection authors will probably end up using Move and Deinit together, I think that as long as we're going to go through the effort of using submodules, it's important to make sure each submodule contains a distinct concept.

I struggle to think of a name that doesn't feel like we're just cramming moves and deinit together for the sake of ergonomics. They still feel like separate concepts to me. We could be upfront about it and use a name like MoveAndDeinit, though.

W.r.t. to the module name ExplicitDeinit, I agree a single word would be better for symmetry. Is the module name Initialization any better? Though it only contains the explicitDeinit function for now, the module could contain hooks to call other initializers in the future.

vasslitvinov commented 3 years ago

I doubt the virtue of putting moves and deinits in separate modules, other than in simplifying our search of good module names.

Memory.ExplicitManagement stands out for me as a good name for everything related to "low-level" operations, possibly including malloc+free in the future.