Open mppf opened 6 years ago
Tagging @ian-bertolacci because this relates to things we discussed him possibly getting to this summer.
I haven't made it all the way through this issue yet, in part just due to typical lack of time, and in part because I disagree with the following early statement:
One option that I can think of is to have a ref/nonref pair of l/f iterator sets. In that case the following snippet would write only to existing indices in the domain:
forall v in SparseArr do v = 5;
whereas following one would print out IRVs as well:
forall v in SparseArr do writeln(v);
...which throws me off-track mentally and into my own thoughts. So I thought I'd capture those thoughts rather than waiting until I'd gotten through the whole thing, even though that seems unfair.
In particular, I think that sparse data structures should have O(nnz) running time for all operations by default (where nnz = number of nonzeroes or number of IRVs), where that default could be overridden by referencing dense variables (operations using dense domains or zipperings with dense objects where that dense object is the leader).
Specifically, I think that if one wants to write out all elements of a sparse array including IRVs / zeroes, some ways to do it would be:
writeln(SparseArr[DenseDom]);
or
forall i in DenseDom do writeln(SparseArr[i]));
I think a counterexample that shows this is the right thing to do can be seen in the statement:
SparseArr = -SparseArr;
or:
SparseArr -= 1;
// equivalent to:
SparseArr = SparseArr - 1;
// equivalent to:
forall (s1, s2) in zip(SparseArr, SparseArr) do
s1 = s2 - 1;
If these sparse array expressions generated O(dense) elements when read and O(sparse) elements when written then they wouldn't make sense / wouldn't work out and/or wouldn't be efficient (O(nnz)).
I think the rationale for this philosophy is that we aren't striving to provide sparse matrices (mathematical objects) in the language (though we might in libraries), but rather are striving to provide sparse arrays (data structures).
To get a sense of where my philosophy on this comes from, it could be useful to read Chapter 6 of my thesis where figure 6.3 is particularly relevant to this specific point. What differs between ZPL and Chapel is that we don't have domain (region) scopes that govern the index sets of array expressions as ZPL did, but I think the same principles apply through zippering and slicing as in the examples above.
This was originally a draft CHIP by @e-kayrakli but then handed to me to complete. I never felt that it was quite CHIP-ready and also kept trying to do too much in improving it, and so never finished. And kept losing track of it. I'm posting it here as an issue so that it's not lost as it does reflect a position point on what irregular array zippering and assignment should do.
Abstract
This document discusses the behavior of zippering regular and irregular domains or arrays defined on them. It focuses on discussing what the expected behavior could be and possible ways of implementing l/f iterators providing such support. Many details, such as bounds checking errors, are not described here.
Introduction
Current sparse array iterator semantics
While thinking about zippering sparse iterators with dense ones, sparse array iterators always felt a bit unnatural to me(although I cannot say that it's wrong):
What makes me a bit uneasy here is that IRVs are never yielded by sparse arrays. Although I find this a bit counter-intuitive, it makes sense when you think about:
An iterator that would yield IRVs would not work in this case as you cannot assign to a non existing index of a sparse array(First you have to add the index to array's domain explicitly).
One option that I can think of is to have a ref/nonref pair of l/f iterator sets. In that case the following snippet would write only to existing indices in the domain:
whereas following one would print out IRVs as well:
ref/nonref functions of same name resolve correctly today. But I'd be surprised if what I proposed works. At the same time, I don't expect that implementing support for this would be difficult.
I have mixed feelings towards this idea as the iterator behavior depends on usage.
Current sparse domain iterator semantics
I don't have much to talk about current sparse domain iterator behavior.
How To Interpret Cross-Type Zippered Loops
Dense/Sparse - Sparse/Dense Zippering
Domains
I cannot think of many use cases that would require zippering such iterators. A user might want to do it where domains don't overlap completely and hope that follower indices would be offset according to difference between two iterators. However, I believe that kind of burden should not be loaded on top of l/f iterators. Such index offsetting can be handled at user level through tuple arithmetic easily.
As I cannot explain why someone would need this, I don't have a strong opinion on what kind of check should be run on
SparseDom.domain._value.parentDomain
andRectDom
. In general, it seems to me that it is not easy to define a good set of conditions that increases safety while keeping performance overhead at bay. My general opinion on l/f iterators(second part of this document) makes me think we shouldn't be worried about them at all, as I personally support very loose coupling between leaders and followers.Arrays
When we are talking about zippering array iterators, there are a couple scenarios I can think of why we would need such loops.
First, this looks like something that is going to fire when sparse/dense arrays are assigned to each other. With ref/nonref sparse array iterator pairs, both of the above zippered loops would behave very naturally.:
If, on the other hand, user wants to avoid assigning IRVs, they should use following:
In this scenario, array assignment in the other direction means populating a sparse array using a dense list of values. Personally, this feels intuitive to me:
So, in effect, this is same as::
If,
RectArr
is a "dense version" ofSparseArr
,(ie indices need to be "plucked out" fromRectArr
) user has to use domain iterators, a full code would look like:Dense/Assoc - Assoc/Dense zippering
Currently, both cases generate different compile time errors. Even if that's the desired behavior, errors are thrown for the wrong reasons and messages are not very helpful.
Regardless, there are some possible scenarios I can think of for zippering such iterators.
Where
idxType
s are differentConsider following snippet where unique ids added to objects in an associative array:
Here zippering order shouldn't have any effect on the behavior.
Where
idxType
s are sameIn which order indices would be yielded from either domain is a bit unclear. However, user must be aware of unordered nature of associative domains, therefore shouldn't write such code if specific ordering is desired. When I read this code all I can interpret is that some associative indices will be matched with regular indices.
In that sense, implementation and sematic-wise, I do not see any difference when
idxType
s are some or different.In terms of behavior, I don't see any valuable difference between associative arrays and domains.
A Possible Implementation
If we want to allow cross-type zippering, semantics should be very simple. Going back to the basics, if we have:
should always be interpreted exactly as:
(I personally believe that there shouldn't be any size checks in zippered iterator. i.e. if one of them returns, then the loop should end gracefully)
To implement such semantics, l/f iterators should follow the basic idea of yielding a single range and following it. A rough sketch is:
Notes
Zero-based ranges should be enforced for compatibility.
getChunk
andchunkIterator
in the above implementation can do shifting.For unbounded ranges or other unbounded iterators(input streams?), a
config param maxChunkSize
can be used to chunk up the unbounded space.These suggestions might break some operator promotions that rely on current semantics. In which case, those operators should have specific overloads.
This l/f implementation is different than e.g. current
DefaultRectangular
iterators as they are rank-aware. I think rank-oblivious iterators can help answer hard questions such as zippering domains/arrays of different ranks. One can easily "flatten" a multi dimensional array:Note that the order of zippered iterators should not matter.
I don't have a strong opinion on exactly what should be yielded by leader. A single range should suffice to provide basic functionality. However, we might want to pass additional data for checking. e.g.
numElems
for halting ifboundsCheck==true
Best approach I can think of is having a record in the internal modules with fields that cover the bare minimums of desired functionality. Then, most common internal leaders should yield variables of that record type. If more exotic behavior is desired, a child record can be implemented. This would allow those exotic iterators to be zippered with standard ones. When exotic behavior is desired, such iterator should be the leader. If an exotic follower follows a standard leader, it can be detected through type system or metaprogramming. After that follower can chose to (1) change its behavior (2) generate a compile time error.