Spike: Interface for dimIter on arbitrary dimensions in sparse domains

ian-bertolacci commented 6 years ago

The dimIter( rank, rank_idx) iterator is an topic of discussion in several issues (#9527 #7374 #8023), and is to improving performance in the bale-toposort benchmark (and likely others).

bradcray commented 6 years ago

A few notes:

there's a challenging design question about how to make dimIter() support arbitrary ranks (cleanly)... I'd probably create a separate design task / spike for that
- if the design becomes too crazy, we may even want to look at falling back onto other concepts... e.g., would iterating over a slice of a sparse array (or something else) be cleaner/preferable for the design patterns that want dimIter?
I'm not certain we'd want to support dimIter() for inefficient dimensions. It can get pretty arbitrarily expensive pretty quickly. Or maybe it should just generate some sort of compiler warning in that case or something...
I think we'll probably want to name the ultimate version of this something other than dimIter()

ian-bertolacci commented 6 years ago

General Development Impact

Currently, there are only three definitions of dimIter:

LayouCS.CSDom.dimIter: does not support iteration along the sparse dimension (e.g. iterating in a column in CSR mode)
DefaultSparseDom.dimIter: does not support any use (always halts).
_domain.dimIter: invokes the underlying implementation's dimIter method.

Because there are few existing implementations, adding arbitrary dimension iteration support will require changes to only DefaultSparse and LayoutCS.CSDom. Because there are few other sparse domain types (SparseBlockDist is the only one I can find) I think supporting this in existing sparse classes in the future is feasible.

API

The standard interface for dimIter should generally be: iter dimIter( d : int, ind : rank*idxType ) : rank*idxType d is the dimension that is iterated along, and ind is a rank-size tuple (or scalar if rank==2) that the non-zero index must match (except for ind[d]) for it to be yielded.

Possible changes:

Could have d as a param, but I dont think that is necessary
ind could be a (rank-1)*idxType as a list of the index components without any input for rank d in order of the components rank but this is confusing (as illustrated by this confusing description).
dimIter isn't a particularly descriptive name, especially if we want to convey the fact that it iterates over non-zeros. The name iterNonZerosOnRankAt( ... ) comes to mind, but is a little unwieldy.

Breaking changes:

Currently LayoutCS.CSDom.dimIter yields scalar idxType values (I propose rank*idxType values). Since most users already have the other components (in the form of ind) recreating the full tuple may be unnecessary.

How To Support

In LayoutCS.CSDom.dimIter, there efficient dimension is already supported. To support the inefficient dimension requires iterating over the efficient dimension, and searching the indices for the dimension component, yielding it if found. Since CSDom only supports dimension 2 domains, these are the only cases we need to consider.

In DefaultSparseDom.dimIter, we need to consider the general case, a sparse domain of integer tuples. Other cases, such as sparse domains of integers or strings or enums do not need to be considered, because the concept of a dimensions has no meaning in those contexts. Further the concept of an 'inefficient' dimension is more ambiguous here, because the user does not know the underlying method of sparse storage. Currently DefaultSparseDom uses COO, which makes iterating over dimensions > 1 inefficient, but also us some flexibility in the approach.

There are two ways of doing this. The easiest way: iterate over the whole list of non-zeros, and yield values who match the necessary ind tuple components. A trick would be to find the first matching tuple by setting ind[d] to min(idxType) then using binary search to find the first occurrence (if it exists) of a matching tuple and then linearylly searching until all possible valid values are exhausted (specifically, until indices[position][1] != ind[1], or if d == 1, indices[position][2] != ind[2]), which could be long, and yielding matching tuples, which would need to be matched on all but two positions.

The harder way: specially sort the domain indices in an order which all tuples match the necessary ind components are adjacent (specifically, sort in lexicographical order, but the d'th component is sorted last; i.e. for dimIter( 1, .... ) resort lexicographically by component 2, then component 3, then component 1). This could be done outside of dimIter, or on the first dimiter( d ) and only resort when a later dimIter( d' ) (where d != d') is invoked, and the sorting is only sensitive to the parameter d (changes to ind do not require resorting). Iterating is then finding the first occurrence of a matching tuple (using the same method as above) and iterating until a the first non-match (specifically when indices[position][rank] != ind[rank], or if d == rank, indices[position][rank-1] != ind[rank-1]) which is minimal.

The main tradeoffs between the two are sorting time versus linear scan time. The easy way requires no sorting but could potentially scan the whole indices list and not yield a value. The hard way does require sorting, even if that the dimIter will yield no values for those parameters, but that sorting could be done ahead of time, without redundant resorts, and likely falls into near-best-case sort runtime for certain algorithms (since it will at worst sort a very nearly sorted array), and will only iterate through the indices list exactly as many times as there are matching non-zero indices in the whole sparse domain.

Issue of Inefficient Dimensions

In LayoutCS.CSDom, it will certainly be inefficient to iterate along the sparse dimension. However, I think we should still support it, but to warn the user, both through the documentation and a compiler warning. Sufficiently motivated users will not want this and will change their storage format, either by changing the sparse dimension or by using (or creating) a different sparse format entirely. But there will likely be some interim period between realizing that they need a different format and using that format where they would still like their application to use the dimIter method. The implementation cost is negligible, so functionality could be removed very easily (by commenting out the single ~7 line code block) if necessary.

Since DefaultSparseDomain is the least specific type of sparse domain (implementing COO, not that users are necessarily aware) supporting arbitrary dimensions (including ineffecient ones) is important to the usability of the class. Frankly, it wouldnt make sense to not support dimIter on arbitrary ranks.

Development Cost

I already have a LayoutCS.CSDom.dsiIter implementation that I was playing around with for toposort. The only changes will be modifying it to return rank*idxType values, documentation, and the inefficiency warning.

Support for DefaultSparseDom.dsiIter does not present any major technical challenges, as far as I can see. There are a few design choices to consider, namely which method should be used, and if the harder method is used, should the indices stay in that sorted order and should there be a method to change that ordering (as a user, for example) outside of dimIter.

Hard Method Proof of Concept

Here is a poof of concept implementation of the 'hard' default sparse dimIter methodology.

use Sort;
use Search;
use Random;
config const globalDebugMode : bool = false;
// Comparison class
record SortOnRankLast {
  const rank : int;
  // compare a and b lexicographically, except a[rank] and b[rank] are compared last.
  // eg, SortOnRankLast( tuple.size ) == normal lexicographic comparison
  proc compare( a : _tuple, b : _tuple ){
    if a.size != b.size then
      compilerError("tuple operands to < have different sizes");
    // lexicographically compare all others
    for param i in 1..a.size {
      // skip special rank
      if i != rank {
        // a < b;
        if a(i) < b(i) then return -1;
        // a > b
        else if a(i) > b(i) then return  1;

      }
    }
    // check special rank
    if a[rank] > b[rank] then return  1;
    if a[rank] < b[rank] then return -1;
    return 0;
  }
}

// dimIter example test
iter dimIterExample( array : [] 3*int, compare, d : int, in ind : 3*int, debug : bool = globalDebugMode ) {
  ind[d] = min(int);
  // only need to check tuples on the inner most non-d index.
  // If d is not size, then this value *is* size, otherwise it is size-1
  const check_d = if d == ind.size then ind.size-1 else ind.size;
  if debug then writeln( ind );
  // get possible start position
  var (found, pos) = binarySearch( array, ind, compare );
  if debug then writeln( (found, pos) );
  // if no-match return
  if ind[check_d] != array[pos][check_d] then return;
  // while still matching
  while array[pos][check_d] == ind[check_d] && pos < array.size{
    yield array[pos];
    pos += 1;
  }
}

var a : [1..#9] 3*int = [ (1,1,1), (1,2,1), (2,1,2), (2,2,1), (2,2,2), (2,2,3), (2,3,2), (3,2,1), (3,3,3) ];

shuffle( a );

var iter1 = new SortOnRankLast(1);
var iter2 = new SortOnRankLast(2);
var iter3 = new SortOnRankLast(3);

mergeSort( a, comparator=iter1 );
writeln("Sorted on rank 1 last:");
writeln( a );
writeln( "dimIter( 1, (_,2,1) )");
for i in dimIterExample( a, iter1, 1, (0,2,1) ) {
  writeln( i );
}

mergeSort( a, comparator=iter2 );
writeln("Sorted on rank 3 last:");
writeln( a );
writeln( "dimIter( 2, (2,_,2) )");
for i in dimIterExample( a, iter2, 2, (2,0,2) ) {
  writeln( i );
}

mergeSort( a, comparator=iter3 ); // note this is the normal lexicographic order
writeln("Sorted on rank 3 last:");
writeln( a );
writeln( "dimIter( 3, (2,2,_) )");
for i in dimIterExample( a, iter3, 3, (2,2,0) ) {
  writeln( i );
}
writeln( "dimIter( 3, (1,1,_) )");
for i in dimIterExample( a, iter3, 3, (1,1,0) ) {
  writeln( i );
}

ian-bertolacci commented 6 years ago

I'm amending this write-up. The hard method is actually harder, because we need to consider the array's built from. There are several ways about this:

Reordering the data in all the arrays to match the order of the domain (a terrible idea).
Maintaining a separate list of indices that can be maintained in any order (not great)
Attaching a position attribute to the index (eg ( sparse index, array index)) that is used internally to locate map that index to the array element (not the worst, but it adds overhead to adding and removing indices from the domain).

Because of this, I am more partial to the easier method.

bradcray commented 6 years ago

Some quick notes:

(or scalar if rank==2)

Should this read "== 1"?

I agree that the name dimIter() isn't particularly attractive / ideal (though I don't have a favorite alternative).
When I think about arbitrary-dimension dimIter()s on a sparse 3+D array, I tend to imagine that only the first dimension will be dense (for[all] i in 1..n) and that all subsequent ones will be a dimIter()-style iterator (since, in a very sparse domain, e.g. the 20 out of n3 nonzeroes in the NAS MG input array, you wouldn't want to do `O(n2)work when you could just doO(n)`. This presents a bit of an interface challenge, though, because it suggests that each loop in the nest fills in one more index to be filled in (and, in general, it may also need to know the order of the dimensions of the outer loops that preceded it?).
I remain unconvinced that inefficient iteration orders need to be supported (e.g., "loop over the columns of a CSR"), at least with this iterator. Maybe we'd eventually have a genDimIter() that would support inefficient orders at any cost? But in this effort, I'm much more interested in being able to write an efficient loop nest one loop at a time rather than an arbitrarily bad loop nest order (just because I think it's a lot of implementation cost / work and will only make users mad / miserable... "why is my program so incredibly slow?!").
of course, another challenge in thinking about higher-dimensional sparse domains/arrays is that at present, we only support them using COO format, and it doesn't support iteration orders other than RMO. For some slightly wider-open food for thought, you might read about the sparse domain ("region") format chapter 6 of this sleep-inducing document.

ian-bertolacci commented 6 years ago

@e-kayrakli @buddha314 Thoughts on what you want to see or how you might use this? Any other users we should be asking?

buddha314 commented 6 years ago

I am way behind on this issue, to be honest. However, my primary use case is to iterate over columns or rows of sparse matrices, skipping the undefined bits. Often I use that in some kind of summary like max or standard deviation. I hope that helps. Oh, also used in conjunction with a some qualifier like 'where value > 3` or the like.

ian-bertolacci commented 6 years ago

2 questions:

What domain types are you using this on?
Do you ever iterate over the 'inefficient' domain? Do you know which domain is 'inefficient'?

buddha314 commented 6 years ago

real matrices are more common than integers, expect in the deep learning space... they are fans of "one-hot" vectors and the concatenation thereof. So perhaps a whole class of binary matrices would be interesting.

Yes, when I have a matrix, I'm often thinking in terms of graphs (so the entries are edges) or transition probabilities. In the latter case, you end up in a Minimax like situation where you are going back and forth over column and row maxes.

chapel-lang / chapel