owned & borrowed syntax proposal: like ref intent/kind

chapel-lang / chapel

a Productive Parallel Programming Language

https://chapel-lang.org

Other

1.79k stars 421 forks source link

owned & borrowed syntax proposal: like ref intent/kind #8652

Closed mppf closed 6 years ago

mppf commented 6 years ago

Previous: #8651. Next: #8653.

General idea: owned-ness or borrow-ness are like ref as variable decorators / intents rather than types. owned decorator can apply to integer values, records, etc in order to allow generic programming but only has an effect for class types. Default is 'borrowed' but that can also be explicitly specified.

Error to store result of new into a variable / formal argument not marked "owned".

Locales Primer Example

class Node {
  var data: real;
  owned var next: Node;
}
owned var head = new Node(0);
var current = head; // borrow because not marked owned
for i in 1..numLocales-1 do on Locales[i] {
  current.next = new Node(i);
  current      = current.next.borrow();
}

Generic Collection Example

record Collection {
  owned var element;
}
proc Collection.addElement(owned x) {
  element = x;
}
proc test() {
  var c: Collection(MyClass);
  c.addElement(new MyClass(1));
  var d: Collection(int);
  d.addElement(2);
}

Generic Function Example

// default intent is "borrow" but could be specified explicitly, too
proc process(data) {
  var sum = 0;
  for i in 1..n {
    sum += data.value; // single deref here
  }
}
proc test() {
  owned var data = new MyClass(1);
  process(data);
}

Borrow into Argument Record

record Argument {
  var data; // borrow by default, since not marked
}
proc process(arg:Argument) {
  ... use arg.data ...
}
proc test() {
  owned var data = new MyClass(1);
  var arg:Argument(MyClass) = new Argument(data); // record data is a borrow
  process(arg); 
}

Function returning newly allocated object

proc returnNewMyClass() owned : MyClass {
  return new MyClass(1); // return type is MyClass but with 'owned' return intent
}
proc wrapReturnNewMyClass() owned {
  return returnNewMyClass();
}

Function returning a borrow

// default is "borrow" but could also write proc returnBorrowGlobal() borrow : MyClass
proc returnBorrowGlobal() : MyClass {
  return globalMyClass.borrow();
}
proc wrapReturnBorrowGlobal() {
  return returnBorrowGlobal(); // return borrow by default
}

Does specifying type change new?

var myC = new MyClass(1);
// compiler error: new class not stored into an owned
var myC:MyClass = new MyClass(1);
// compiler error: new class not stored into an owned
owned var myC = new MyClass(1);
// myC has type MyClass and "kind" owned var
owned var myC:MyClass = new MyClass(1);
// myC has type MyClass and "kind" owned var

psahabu commented 6 years ago

I prefer this syntax to the ones in #8651, #8653, and #8654. It makes sense to have owned as a decorator like sync and atomic, instead of a separate syntax trying to simulate Rust's typing.

Additionally it clarifies that owned is a statement about the management of the object pointer rather than the object itself, and it's easy to scan a function to see if it takes ownership of any objects. Also it precludes double-deref in generic functions without splitting the semantics of argument and variable class types, and makes it very clear when ownership is being transferred through an argument.

My preference would be to shorten owned to own, for convenience.

own var myC = new MyClass(1);

If we wanted to allow users to create objects without opting into owned, we could add an unsafe decorator and possibly unsafe instantiation:

unsafe var myC = new MyClass(1);
unsafe var myC = new unsafe MyClass(1);

mppf commented 6 years ago

I prefer this syntax to the ones in #8651, #8653, and #8654. It makes sense to have owned as a decorator like sync and atomic, instead of a separate syntax trying to simulate Rust's typing.

@psahabu - to be clear, the proposal in this issue makes owned a decorator like ref. sync and atomic affect the type of the result but ref does not. (Vs. in Rust, & does produce something with a different type).

For example,

var x = 1;
ref r = x;
writeln(r.type:string);

outputs int(64) and not ref int(64).

Such a design has implication in particular for the instantiation of generic functions and types. It answers the question, "Who decides if a value in a generic is Owned or Borrowed?", differently from how a type-based approach would. In particular, it's the generic function/collection author that decides whether or not their code owns or borrows, rather than the user of the generic function/collection. I suspect that this would make it easier for authors of generic code to reason about what is occurring, but also that it might require the introduction of adapter records if the user of that code wanted to "borrow" when the collection is implemented to "own".

For example, in the Generic Collection Example above, we had this:

record Collection {
  owned var element;
}
proc Collection.addElement(owned x) {
  element = x;
}

What if a user of such a Collection in a library wanted to have it store borrows?

proc tryCollectingBorrows() {
  var b = new MyClass(1);
  var c: Collection(MyClass);
  c.addElement(b.borrow());
}

would result in an error, since you can't pass a borrow into an owned.

The user would have to write something like this, instead:

record Borrow {
  borrow var x; // but borrow is the default in this proposal
}
proc collectingBorrows() {
  var b = new MyClass(1);
  var c: Collection(MyClass);
  c.addElement(new Borrow(b.borrow()));
  // could also be written c.addElement(new Borrow(b));
}

I think such a requirement is defensible because collections usually own the contained values.

bradcray commented 6 years ago

I've had a gut negative reaction to this proposal, and have been trying to understand why. Here's my first attempt at trying to characterize it:

Michael asked me why we put ref as a characteristic of the symbol's declaration (in the place of const or var) rather than part of its type, and my quick answer was that we oscillated between the two approaches several times but went with the current approach because we didn't want refs to stack (at which point they'd essentially become pointers). With more thought I'm remembering some other reasons:

because we wanted refs, like consts to only support initialization and not re-assignment (again, to make them less like pointers; and because we wanted assignment to a ref to be an assignment to the thing it referred to);
because generally speaking refs don't affect the things you can do with an int as much as, say sync and atomic do when used to modify int (where it really is more like a different type, making different methods and ways of accessing the variable available/unavailable).

Intuitively, it seems to me that owned C, shared C, borrowed C are more like sync and atomic in this respect than they are ref (though I think we also don't particularly want them to stack... But atomics and syncs don't stack either, by simply restricting which types it's legal to apply them to). Related to that last point, I'm also not crazy about the notion of putting these with the var/const position and ignoring them for types that aren't classes rather than making these more like sync, atomic where they choose which types they can be applied to.

Finally, if we want to support arrays of owned or borrowed or shared, I think it makes more sense to put it in the type of the variable to preserve the left-to-right reading order of our types. var A: [1..n] owned C; pretty clearly says "A is an array over the indices 1..n of owned class pointers" whereas owned var A: [1..n] C; (I'm guessing this is how we say it?) reads more like "this is an owned array over the indices 1..n of class references C" or requires me to mentally re-arrange the elements to interpret them correctly.

lydia-duncan commented 6 years ago

I find this idea very intriguing and a lot clearer than the previous proposal (I haven't gotten to the next two yet). My preference would be to treat it like ref rather than in addition to the var from an aesthetic perspective - the owned-ness of the instance seems like it will be constant to its existence. I'm torn between whether it is useful to distinguish between an owned const x and a borrowed const x, and would have to think about it more.

That said, Brad's points about the difference between how ref and sync/atomic impact the instance does hold water for me, and it does seem important to distinguish between an array that is owned for its own sake and an array that contains owned instances.

I'd like to see this pathway discussed more.

mppf commented 6 years ago

Finally, if we want to support arrays of owned or borrowed or shared, I think it makes more sense to put it in the type of the variable to preserve the left-to-right reading order of our types. var A: [1..n] owned C; pretty clearly says "A is an array over the indices 1..n of owned class pointers" whereas owned var A: [1..n] C; (I'm guessing this is how we say it?) reads more like "this is an owned array over the indices 1..n of class references C" or requires me to mentally re-arrange the elements to interpret them correctly.

It seems to me that the idea of this sketch would be that collection authors say whether or not they own their elements. I.e. we'd probably make the array implementation "own" the elements, just as in my collection example.

So var A: [1..n] C; would produce an array that owns its C elements.

Or maybe the default would be the other way.

Either way, if the user didn't like the default, they'd have to create a record Borrow like I described in my above comment. But it feels a little bit like that just turns it back into a type.

The other main objection to this particular proposal that I see is that while it might be reasonable to add siblings to ref for owned and borrowed, when we have variations on managed pointers, such as Shared or OwnedNullable, it doesn't seem so likely to work out.

bradcray commented 6 years ago

I think such a requirement is defensible because collections usually own the contained values.

I'm trying to think about whether I believe this or not. The main counterexample that sprang to mind is a dictionary that is used to refer to specific nodes in a pre-existing graph (say) where it's just supplying bookmarks, not exerting any ownership over the things in the graph.

(Though not directly related to this proposal, I think a similar type of assumption is what makes me a bit leery of proposals that assume a class field object is either an owned or borrowed class instance. It seems that while some data structures are most likely to own, such as a linked list, others are less likely to, such as a list of neighbors in a DAG or graph data structure. This makes me worry that there isn't a correct implicit default).

psahabu commented 6 years ago

This makes me worry that there isn't a correct implicit default.

Could the compiler do autoboxing? If a borrow is being added to a collection of Owned items, the compiler could wrap it with new Borrow(T) and include forwarding to the underlying borrow. Something similar could be done in reverse for an Owned being added to a collection of borrows.

Or maybe instead of forcing autoboxing on all collections, users could enable it by including an interface/concept.

mppf commented 6 years ago

@psahabu if the compiler did all of that, aren't we just going back to using types?

lydia-duncan commented 6 years ago

Trying to get back up to speed on lifetime stuff.

I don't think having a default for an array or collection storage is as important as having a default for unwrapped types, so long as it is obvious in the field declaration how the class is being stored (in both owned and borrowed situations). I do think unification is important for the array's contents, though