Closed vasslitvinov closed 5 months ago
If the intention is that the iterator yield by value but that value is not mutable, const
works today iter myIter() const { ... }
. Of course, we don't currently have a way to opt in to "yield by value something that is mutable" today.
I remember working on some parts of the compiler related to yielding records. I think that was in PR #8073. However I don't see any indication in the PR message there than I changed whether iterators yield by value or not. (Just what happens when they do yield by value). Even that was quite a while ago. I am thinking it is reasonable for us to consider making changes here.
My first reaction here was that iterators should match the behavior of proc
since returning and yielding seem analogous. But then I realized that there are a lot of cases where a proc
must copy-out because it's coming to the end of its life, and a local variable is being returned (so it will not exist once the proc
is done and its stack frame is freed). Whereas, for an iterator, the iterator is still live across the yield, so there's not necessarily a reason to copy out. Taking a specific example, if I write:
proc foo() {
var r = new R().;
return r;
}
I pretty much need to copy r
out (by some definition) for the code to work right. Yet, for:
iter foo() {
for i in 1..1000 {
var r = new R();
yield r;
}
}
we could just refer to r
rather than copying it out.
So it seems justifiable to have default yield intents differ from default return intents, where the tradeoff feels like it might be balancing the potential negative of confusion (the need to define and rationalize different behaviors between the cases) vs. the potential positive of performance (maybe we can avoid copying out expensive things?).
default yield semantics: | by value | by default arg intent |
---|---|---|
best for | loop consumes items | loop references items |
to yield by value | default | need new out intent |
to yield by ref | ok | ok |
to yield tuple elts by ref | need tuple intents or w/a | MAY need tuple intents |
to yield by default arg intent | ??? | default |
favored by Arkouda | tbd | tbd |
favored by CHAMPS | tbd | tbd |
Arkouda defines three iterators:
Loops in Arkouda:
// matches() iterator yields locally-created tuples of `regexMatch`
// preferred: by const ref / default intent
// to avoid bit copying
// iterator does not need it after yielding; user does need it to be a value
for m in myRegex.matches(...)
// assuming `names` is an array, this iterates over its elements by ref
for (name, bitwidth) in zip(names, bitwidths) { ... }
// analogously
for (filedom, filename, file_idx) in zip(locFiledoms, locFiles, 0..)
// There are loops over localSubdomain(), which redirects to:
iter dsiLocalSubdomains(loc: locale) {
yield dsiLocalSubdomain(loc); //dsiLocalSubdomain() returns by value
}
// ==> iter does not need YV; user does not need YV to be a value
// getNumRequestMetrics() returns a list(owned Metric?)
// a ref iteration over a collection of `owned`
var metrics = new list(owned Metric?);
for metric in getNumRequestMetrics() do metrics.append(metric);
// another ref iteration over arrays
for loc in Locales .....
// String iterator yields freshly-created strings
// iter does not need YV; user does not need YV to be a value
iter string.split()
// forall loops: a lot of zippered loops like the following,
// where the array elements are ints or some non-trivial things
// ex. datasets
forall (x,y) in zip(xArray, yArray)
For-loops are:
for zone in zones.borrow()
for part in zoneName.split(....)
Forall loops are similar.
CHAMPS iterators:
These are my own key takeaways for me from the discussion so far:
Additionally, I wanted to link some other issues that came up in discussion:
1
to a const ref
formal (see also issue #17195).We had some side discussion about whether or not yielding by 'const ref' would mean that literals can't be yielded. They can be, but you can't currently pass a literal like 1 to a const ref formal
Towards that end, I propose to allow passing and yielding literals and params by const ref
, https://github.com/chapel-lang/chapel/issues/17195#issuecomment-1535664432
The current proposal is:
Change the default yielding semantics to today's const
argument intent or to "cvref" (#21499).
Introduce out
yield intent to specify yielding by value a.k.a. "copy out."
Allow the above intents also to be used as explicit argument, return, and task intents, in addition to explicit yield intents.
Experiment with applying this to Arkouda and CHAMPS to see how well it works.
The altnerative proposal is:
Keep today's semantics of yielding by the default intent to be "by value."
Allow const
(with today's argument intent semantics) or cvref as an explicit return, yield, and task intent, in addiion to the argument intent.
Under this alternative proposal we will not have a way to specify yielding and returning by today's default argument intent, in particular, in a way that yields / returns atomics, syncs, arrays, records by non-const ref.
With either of these proposals (supposing that we extend const
to return intent in addition to yield intent), we have the problem that const x = f()
would capture by value even if f
returns a reference by const
. (Which might be confusing / rely on context more than ideal for what const
means). See also https://github.com/chapel-lang/chapel/issues/21499#issuecomment-1549719773
Here are some stats on yields encountered while compiling our tests under test/release
and test/studies
.
file | lines |
---|---|
module: packages/LinkedLists.chpl | 92 |
aoc2021-brad: day8.chpl | 5 |
task-private-variable.chpl | 6, 9 |
file | lines |
---|---|
module: dists/BlockCycDist.chpl | 1056 |
module: dists/BlockCycDist.chpl | 1072 |
module: packages/Sort.chpl | 521 |
module: standard/FileSystem.chpl | 660, 877, 892, 1099, 1482 |
module: standard/IO.chpl | 5896, 8586 |
aoc2021-brad: day10.chpl | 8 |
aoc2021-brad: day10a.chpl | 8 |
aoc2021-brad: day13.chpl | 54 |
aoc2021-brad: day18a.chpl | 175 |
aoc2021-brad: day19.chpl | 30 |
aoc2021-brad: day19a.chpl | 30 |
aoc2021-brad: day22a.chpl | 29, 49, 96 |
aoc2021-brad: day4.chpl | 20 |
aoc2021-brad: day4a.chpl | 20 |
aoc2021-brad: day8a.chpl | 40, 42 |
aoc2021/bugs: day05.chpl | 40, 73 |
aoc2022-brad: day05a.chpl | 39 |
aoc2022-brad: day11.chpl | 140 |
aoc2022-brad: day11a.chpl | 140 |
aoc2022-brad: day12-par-multiversion.chpl | 108 |
aoc2022-brad: day12-par.chpl | 80 |
aoc2022-brad: day12.chpl | 65 |
aoc2022-brad: day12a-par.chpl | 69 |
aoc2022-brad: day12a.chpl | 68 |
aoc2022-jer: day12a-alt.chpl | 8 |
aoc2022-jer: day12a-par-alt.chpl | 8 |
aoc2022-jer: day12a-par.chpl | 12 |
aoc2022-jer: day12a.chpl | 12 |
common/bradc/BradsBlock1D.chpl | 362 |
common/bradc/BradsBlock1DPar.chpl | 340, 592 |
FTree.chpl | 19, 28, 82, 89, 101, 123, 124, 135, 151 |
FTree.chpl, cont. | 158, 163, 214, 227, 232, 263, 269 |
Glob.chpl | 35, 79, 92, 97, 142, 155 |
Graph500_Modules/Scalable_Graph_Generator.chpl | 203 |
SSCA2_RMAT_graph_generator.chpl | 181 |
beer-promoted-explicit.chpl | 32 |
beer-promoted-infer-explicit.chpl | 45, 46, 47, 48 |
beer-promoted-init.chpl | 29 |
chameneos-redux-cas.chpl | 169 |
chameneos-redux-lydia.chpl | 192 |
chameneos-redux-waitFor.chpl | 169 |
fannkuch-redux.chpl | 107 |
readcsv.chpl | 235 |
recursive.chpl | 13 |
scalableDataGenerator.chpl | 208 |
task-private-variable.chpl | 4 |
file | lines |
---|---|
common/bradc/BradsBlock1D.chpl | 361 |
Glob.chpl | 57, 119 |
Here are a couple of case studies from the above category "Yields of something else".
// test/studies/adventOfCode/2021/bradc/day22a.chpl
record Region {...}
var allRegions = array of Region;
var disjointRegions: list(Region);
for dr in findDisjointRegions(allRegions[i]) do
disjointRegions.append(dr);
iter findDisjointRegions(r): Region { // want 'r' by in-intent, yield by value
for r3 in unOccludedSections(...) do
for r4 in findDisjointRegions(r3) do
yield r4;
yield r;
}
iter unOccludedSections(...): Region { // want to yield by value
yield new Region(...);
}
// test/studies/madness/aniruddha/madchap/FTree.chpl
record Node {
var lvl, idx : int;
// several iterators yield newly-created Nodes
}
class LocTree {
const coeffDom : domain(1);
var nodes : domain(Node);
var coeffs : [nodes] [coeffDom] real;
var zeroes : [coeffDom] real; // Return zeroes from this() when
iter these() // yields existing arrays
iter coeffs_iter() // yields existing arrays
iter node_iter() // yields existing Nodes
}
class FTree {
const order : int;
const coeffDom: domain(1);
const tree : [LocaleSpace] unmanaged LocTree;
iter these() // yields existing arrays
iter coeffs_iter() // yields existing arrays
iter node_iter() // yields existing Nodes
}
Here are the situations where copy-on-yield happens in iterators that yield record-like types, including strings and arrays, with the default yield intent. I examined only the codes under modules
test/library
test/release/
test/studies
. Statistic-wise, there are 28 iterators with copy-on-yield vs. 153 iterators without it.
yielding string literals by the default intent - we could optimize this case:
iter Regex.split()
yields the empty string among others
spec:task-private-variable.chpl
yielding existing keys -- should yield by cvref:
DefaultAssociativeDom.these()
similarly, yielding elements of existing arrays -- should yield by cvref:
DistributedBagImpl.targetLocalesNotHere()
LinkedList.these()
Sort.sorted()
DataFrames.*.these()
similarly, yielding existing domains -- should yield by cvref:
HashedDist:UserMapAssocDom.dsiLocalSubdomains()
test/studies/hpcc/common/bradc/BradsBlock1D.chpl
: Block1DArr.these()
another example of the above -- madness: FTree.chpl
class LocTree {
var coeffs: array with non-POD elements;
// both of the following should yield by cvref
iter these1() { for c in coeffs to yield c; }
iter these2() { for i in ... to yield coeffs[i]; }
}
yielding elements of an array that are dead after the yield -- should transfer the ownership out from array elements, or else yield by cvref:
DefaultAssociativeDom.dsiSorted()
DefaultAssociativeArr.dsiSorted()
test/studies/adventOfCode/2021/bradc/day8.chpl
the one case where copying is intentional - in BlockDist:
// Iterator that yields values instead of references (to enable RVF)
iter valIter(iterable) { for elem in iterable do yield elem; }
Here is an argument in favor of a cvref-like default yield intent: in the situations where it would be preferable to have the by-value yield intent, we can define a copy-elision transformation that switches the iterator to by-value automatically. (Thanks Brad for this idea.) This transformation can be defined in the spec unless it qualifies as an optimization.
Transformation details:
[Added:] By the same token, we could switch the default RETURN intent to be const ref (for non-trivial types) and transform it to by-value for those procedures that return local things.
Here are the key findings of discussing this topic in a subgroup.
const
as the default yield intent is: it makes iterators that yield collection elements work “out of the box”With that, we chose the following course of action:
out
return/yield intent to mean return/yield by valueconst
return/yield intent to mean the same as const
argument intentconst
yield intent works, see what happens if we make it the default, time permitting; do make it the default if we like itconst
); this can be done post-2.0We currently are not planning to any changes in this area, so I closed this issue. We can reopen if needed. @lydia-duncan fyi.
This issue asks whether iterators with the default return should yield by "copy out" or by
const ref
.Background
Currently iterators with the default return yield by value. That is, the ownership of the yielded value is transferred into the loop body when possible, otherwise the loop body gets a fresh copy of the yielded value. In either case, the loop body is responsible for destructing the value it gets.
Recently in #21789 we adjusted the case of yielding tuples to yield each component by value. This is because we want the mode of yielding something to be the same whether it is yielded on its own or as a component in a tuple. This issue upholds this latter principle.
In a related issue #21717, we propose the specification of the tuple yielding mode, including ref-ness and const-ness, on a per-component basis.
In another related issue, #15973, we considered the
ref
return intent on a tuple-yielding iterator to mean that each component of the yielded tuple is to be yielded by ref.This issue is about what the behavior should be by default.
One dimension: is yielding like returning or unlike returning?
Today's "yield by value" default is designed to match the behavior of returning a value from a procedure, which is by value, not by reference.
At the same time, yielding and returning are different w.r.t. their lifetimes. While a procedure completes its lifetime before its returned value is used by the caller, the lifetime of the iterator, and in a way of a yield statement, contains the lifetime of the loop body that operates on the yielded value.
This leads us to consider making returning and yielding behave differently.
Another dimension: is yielding like returning or like argument passing?
Yielding is like returning in that it passes a value to the callee.
However, I like to view iterator yields as "calling" into the loop body as if it were a function with the index variable as a formal argument. That is, I view:
as:
which naturally calls for using the default argument intent when yielding.
Use cases: yielding an external vs. a locally-created value
An iterator can yield these two kinds of non-POD object(s):
For example:
If yielding by value,
yieldExternalObjects()
will create copy of A's elements, which may be undesirable.If yielding by reference,
yieldCreatedObjects()
will retain the ownership of the records it creates and delete them. Instead, passing the ownership into the loop body will likely be more appropriate.This proposal
This issue proposes to favor the argument-passing-like behavior and yield by the default argument intent.
What about the scenario of
yieldCreatedObjects()
with ownership transfer into the loop, you may ask?If the counterpart of yielding, under the above lambda interpretation, is argument passing, then the counterpart of yielding by value is… argument passing with an explicit
in
intent. How about we apply this same solution in the case of yielding? I can see two ways:where the second one makes more sense to me.
Alternatively...
The proposal in #21717 notwithstanding, we can allow iterators to yield referential tuples by
const ref
intent. This is currently disallowed since #21732. This way:yieldCreatedObjects()
with ownership transfer is the default, as it is today, andyieldExternalObjects()
without creating copies can be achieved with aconst ref
intent on the iterator, whether yielding non-POD objects on their own or as tuple components.In the latter case we may even want to yield referential tuples by non-const ref, which will allow modifications of the referrands, although at the expense of double indirection:
creates a tuple with two references (today's semantics) in a temp and returns a mutable reference to it. However, this semantics is more of a stretch. Ex., is the ability to modify the tuple misleading when it contains a (copy of an) int?
[Added later] How about the performance cost of the extra dereference when yielding referential tuples by [const] ref? Given the nature of what a referential tuple stores -- copies and addresses -- it can be copied upon a yield, analogously to how we copy the _array record when passing around an array by reference.