chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

lifetime checking: specifying function return lifetime #8377

Closed mppf closed 5 years ago

mppf commented 6 years ago

I know of the following interesting patterns:

What syntax can be used to specify these, in case the default behavior is not sufficient?

Here are a few examples where we'd like to be able to specify the lifetime of the return:

Function returning an element of a global hashtable

proc getGlobalHashtableElement( const ref key ) const ref {
  return globalHashTable[key];
}

Here, lifetime inference is likely to assume that the function could return the key argument. But, once key is used to find the hashtable element, the key is no longer used.

Method returning an array element

proc MyType.getElement( const ref key ) const ref {
  return this.array[key];
}

Function returning an element of an outer local hashtable

proc outerFunction() {

  var D:domain(string);
  var hashtable:[D] MyClass;

  proc innerFunction( key )  {
    return hashtable[key];
  }

  var element:MyClass = nil;
  {
    var key = "test";
     element = innerFunction( key );
  }
}

The annotation for innerFunction might need to indicate lifetime matches an outer variable rather than an argument.

psahabu commented 6 years ago

Providing a way to say 'infinite' might be useful, but I don't think Rust-style lifetime annotations make sense for Chapel.

mppf commented 6 years ago

I think "infinite" is somewhat the same as saying "it's a raw pointer" so perhaps the way we'd say that is by using a "raw" pointer type (whatever that is).

psahabu commented 6 years ago

We could call it Unsafe(T).

vasslitvinov commented 6 years ago

The case "copyObjectTree" returns an Owned thing. My understanding is that we need to concern ourselves with the lifetime only of borrowed things. (Correct?)

proc copyObjectTree(...) : owned MyTree {...}

To indicate the lifetime, I propose the syntax borrowed(variable or field reference(s)). Which indicates the shortest of those variable(s)/field(s)' lifetimes. Gives us an excuse to introduce the borrowed keyword. :) For example:

proc getGlobalHashtableElement(key) borrowed(globalHashTable) const ref {...}

proc MyType.getElement(key) borrowed(this) { ... }

It is meaningful to specify the lifetime at the variable granularity, rather than just "global" vs. "local scope". That is because even the global variables are destructed in a particular order. So, for example:

var Table1 ...;
var Table2 ...;

proc get1(key) borrowed(Table1) return Table1.get(key);
proc get2(key) borrowed(Table2) return Table2.get(key);

// illegal: adds a value that may be destructed while Table1 is still live
Table1.put(get2(some key));

// OK
Table2.put(get1(some key));

We can write the same code inside a function so all scopes are local.

BTW "infinite" is imprecise. For example, if a global variable "owns" a class pointer, it needs to delete it upon program shutdown. I assume the deletion is implicit, like for any owner.

By contrast, a global variable of an "unchecked" policy would not do that. The user needs to delete explicitly.

mppf commented 6 years ago

The case "copyObjectTree" returns an Owned thing. My understanding is that we need to concern ourselves with the lifetime only of borrowed things. (Correct?)

proc copyObjectTree(...) : owned MyTree {...}

I think this is a reasonable way to describe that pattern, yes. If Owned is the way to say an owned pointer, it would become

 proc copyObjectTree(...) : Owned(MyTree) {...}

To indicate the lifetime, I propose the syntax borrowed(variable or field reference(s)). Which indicates the shortest of those variable(s)/field(s)' lifetimes. Gives us an excuse to introduce the borrowed keyword. :) For example:

I think the "excuse to introduce the keyword" is a joke that I don't get. A reasonable alternative name would be scope or return scope (these are closer to D's terms). Or it could be lifetime.

The other element you are describing here is to put the description of the return value lifetime near the return type. We could instead put the description of which arguments are available for lifetime inference of the return in each argument.

proc return-scope-borrowed-lifetime MyType.getElement(key) { ... }

Also, another way to write your first example might be this:

proc getGlobalHashtableElement(key) borrowed() const ref {...}

since for the lifetime inference purpose, the minimum lifetime of the empty set is "infinite" lifetime.

We could also instead of marking variables that are borrowed, mark variables that aren't:

proc getGlobalHashtableElement(cannot-return-borrow-of key) const ref {...}

(where obviously we'd come up with a better keyword name).

It is meaningful to specify the lifetime at the variable granularity, rather than just "global" vs. "local scope". That is because even the global variables are destructed in a particular order.

We could do that, but the current analysis does not. Instead it merely concerns itself with what is allocated in what block. I don't know if it would impact the analysis. I'm fairly happy at the moment with this more "coarse grain" approach, though.

BTW "infinite" is imprecise. For example, if a global variable "owns" a class pointer, it needs to delete it upon program shutdown. I assume the deletion is implicit, like for any owner.

By contrast, a global variable of an "unchecked" policy would not do that. The user needs to delete explicitly.

If a global variable "owns" a class pointer, it would only trivially participate in lifetime inference. Since it "owns" a class pointer, it will be assumed that a borrow from that global variable has at least as short a lifetime as the global variable. It's a fair point though that the global variables are destroyed in a particular order - it's similar to adjusting the checking to pay attention to the order of deletion within a block.

vasslitvinov commented 6 years ago

Good alternative options.

I am not excited about using borrow() to indicate "infinite" lifetime. Even though logically that's what it means, some users may get confused by seeing "borrow" where the returned value is "owned".

mppf commented 6 years ago

If we need to also annotate when a function does / does not "invalidate" references to a particular thing (such as an array's domain resize would invalidate references to the array - or Owned.clear() would invalidate borrows of that Owned), that might also require some sort of annotation. (This would support cases we don't currently check for as described in #8382).

mppf commented 6 years ago
proc getGlobalHashtableElement(key) borrowed(globalHashTable) const ref {...}

Two reasons this might be the right general idea:

  1. Near the return value since it's about return value lifetime
  2. The borrowed thing might not be an argument (it might be an outer or a global variable).

I think I'd prefer to use borrowing but if we end up with a borrow keyword, it might work in that environment too. Anyway, if such a keyword introduces a list of variables/arguments, the rule is that the function can only return a borrow of one of those arguments (and not some other variable / argument). That allows appropriate checking. On the other side, the compiler will assume the minimum lifetime of the variables/arguments mentioned in that list when inferring the return lifetime.

psahabu commented 6 years ago

While we can quibble about the syntax, I agree that this is the ideal place to put the return lifetime. I'm still a fan of & over borrow, etc. but I'm not going to die on that hill.

vasslitvinov commented 6 years ago

There will be cases when the return value is not strictly a borrow of any variables. Array indexing is one example; returning a field of a record could be another one.

So maybe we want to present this annotation as "the returned thing has the lifetime of one of these:". In which case I would prefer the keyword to be "lifetime", not "borrow". (Even though I suggested "borrow" above.)

Or we can take the position that borrowing will be a common case. Users will not need to know about lifetimes except in weird corner cases. Even if borrowing is indirect, as in array indexing, we could still present it as borrowing-like.

In this case, a "borrow" variant makes a good sense.

Considering the three proposed variants:

mppf commented 6 years ago

There will be cases when the return value is not strictly a borrow of any variables. Array indexing is one example; returning a field of a record could be another one.

As far as the checker is concerned, these are "borrows" from the array/record containing the fields.

mppf commented 6 years ago

Since we've talked about this, several things have changed that might change the design direction.

  1. We how have a borrowed keyword. It might be reasonable to hang the lifetime specification syntax off of it.
  2. Instead of just specifying the lifetime of a function's return value, we probably need to be able to specify the relationships between the lifetimes of the arguments, in case one is assigned to the other. See #9853.
mppf commented 6 years ago

If we use borrowed for the lifetime annotation, that might make it harder to write a generic function that accepts/returns both borrows and also owned etc.