chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Spike: Interface for dimIter on arbitrary dimensions in sparse domains #10214

Closed ian-bertolacci closed 5 years ago

ian-bertolacci commented 6 years ago

The dimIter( rank, rank_idx) iterator is an topic of discussion in several issues (#9527 #7374 #8023), and is to improving performance in the bale-toposort benchmark (and likely others).

bradcray commented 6 years ago

A few notes:

ian-bertolacci commented 6 years ago

General Development Impact

Currently, there are only three definitions of dimIter:

Because there are few existing implementations, adding arbitrary dimension iteration support will require changes to only DefaultSparse and LayoutCS.CSDom. Because there are few other sparse domain types (SparseBlockDist is the only one I can find) I think supporting this in existing sparse classes in the future is feasible.

API

The standard interface for dimIter should generally be: iter dimIter( d : int, ind : rank*idxType ) : rank*idxType d is the dimension that is iterated along, and ind is a rank-size tuple (or scalar if rank==2) that the non-zero index must match (except for ind[d]) for it to be yielded.

Possible changes:

Breaking changes:

How To Support

In LayoutCS.CSDom.dimIter, there efficient dimension is already supported. To support the inefficient dimension requires iterating over the efficient dimension, and searching the indices for the dimension component, yielding it if found. Since CSDom only supports dimension 2 domains, these are the only cases we need to consider.

In DefaultSparseDom.dimIter, we need to consider the general case, a sparse domain of integer tuples. Other cases, such as sparse domains of integers or strings or enums do not need to be considered, because the concept of a dimensions has no meaning in those contexts. Further the concept of an 'inefficient' dimension is more ambiguous here, because the user does not know the underlying method of sparse storage. Currently DefaultSparseDom uses COO, which makes iterating over dimensions > 1 inefficient, but also us some flexibility in the approach.

There are two ways of doing this. The easiest way: iterate over the whole list of non-zeros, and yield values who match the necessary ind tuple components. A trick would be to find the first matching tuple by setting ind[d] to min(idxType) then using binary search to find the first occurrence (if it exists) of a matching tuple and then linearylly searching until all possible valid values are exhausted (specifically, until indices[position][1] != ind[1], or if d == 1, indices[position][2] != ind[2]), which could be long, and yielding matching tuples, which would need to be matched on all but two positions.

The harder way: specially sort the domain indices in an order which all tuples match the necessary ind components are adjacent (specifically, sort in lexicographical order, but the d'th component is sorted last; i.e. for dimIter( 1, .... ) resort lexicographically by component 2, then component 3, then component 1). This could be done outside of dimIter, or on the first dimiter( d ) and only resort when a later dimIter( d' ) (where d != d') is invoked, and the sorting is only sensitive to the parameter d (changes to ind do not require resorting). Iterating is then finding the first occurrence of a matching tuple (using the same method as above) and iterating until a the first non-match (specifically when indices[position][rank] != ind[rank], or if d == rank, indices[position][rank-1] != ind[rank-1]) which is minimal.

The main tradeoffs between the two are sorting time versus linear scan time. The easy way requires no sorting but could potentially scan the whole indices list and not yield a value. The hard way does require sorting, even if that the dimIter will yield no values for those parameters, but that sorting could be done ahead of time, without redundant resorts, and likely falls into near-best-case sort runtime for certain algorithms (since it will at worst sort a very nearly sorted array), and will only iterate through the indices list exactly as many times as there are matching non-zero indices in the whole sparse domain.

Issue of Inefficient Dimensions

In LayoutCS.CSDom, it will certainly be inefficient to iterate along the sparse dimension. However, I think we should still support it, but to warn the user, both through the documentation and a compiler warning. Sufficiently motivated users will not want this and will change their storage format, either by changing the sparse dimension or by using (or creating) a different sparse format entirely. But there will likely be some interim period between realizing that they need a different format and using that format where they would still like their application to use the dimIter method. The implementation cost is negligible, so functionality could be removed very easily (by commenting out the single ~7 line code block) if necessary.

Since DefaultSparseDomain is the least specific type of sparse domain (implementing COO, not that users are necessarily aware) supporting arbitrary dimensions (including ineffecient ones) is important to the usability of the class. Frankly, it wouldnt make sense to not support dimIter on arbitrary ranks.

Development Cost

I already have a LayoutCS.CSDom.dsiIter implementation that I was playing around with for toposort. The only changes will be modifying it to return rank*idxType values, documentation, and the inefficiency warning.

Support for DefaultSparseDom.dsiIter does not present any major technical challenges, as far as I can see. There are a few design choices to consider, namely which method should be used, and if the harder method is used, should the indices stay in that sorted order and should there be a method to change that ordering (as a user, for example) outside of dimIter.

Hard Method Proof of Concept

Here is a poof of concept implementation of the 'hard' default sparse dimIter methodology.

use Sort;
use Search;
use Random;
config const globalDebugMode : bool = false;
// Comparison class
record SortOnRankLast {
  const rank : int;
  // compare a and b lexicographically, except a[rank] and b[rank] are compared last.
  // eg, SortOnRankLast( tuple.size ) == normal lexicographic comparison
  proc compare( a : _tuple, b : _tuple ){
    if a.size != b.size then
      compilerError("tuple operands to < have different sizes");
    // lexicographically compare all others
    for param i in 1..a.size {
      // skip special rank
      if i != rank {
        // a < b;
        if a(i) < b(i) then return -1;
        // a > b
        else if a(i) > b(i) then return  1;

      }
    }
    // check special rank
    if a[rank] > b[rank] then return  1;
    if a[rank] < b[rank] then return -1;
    return 0;
  }
}

// dimIter example test
iter dimIterExample( array : [] 3*int, compare, d : int, in ind : 3*int, debug : bool = globalDebugMode ) {
  ind[d] = min(int);
  // only need to check tuples on the inner most non-d index.
  // If d is not size, then this value *is* size, otherwise it is size-1
  const check_d = if d == ind.size then ind.size-1 else ind.size;
  if debug then writeln( ind );
  // get possible start position
  var (found, pos) = binarySearch( array, ind, compare );
  if debug then writeln( (found, pos) );
  // if no-match return
  if ind[check_d] != array[pos][check_d] then return;
  // while still matching
  while array[pos][check_d] == ind[check_d] && pos < array.size{
    yield array[pos];
    pos += 1;
  }
}

var a : [1..#9] 3*int = [ (1,1,1), (1,2,1), (2,1,2), (2,2,1), (2,2,2), (2,2,3), (2,3,2), (3,2,1), (3,3,3) ];

shuffle( a );

var iter1 = new SortOnRankLast(1);
var iter2 = new SortOnRankLast(2);
var iter3 = new SortOnRankLast(3);

mergeSort( a, comparator=iter1 );
writeln("Sorted on rank 1 last:");
writeln( a );
writeln( "dimIter( 1, (_,2,1) )");
for i in dimIterExample( a, iter1, 1, (0,2,1) ) {
  writeln( i );
}

mergeSort( a, comparator=iter2 );
writeln("Sorted on rank 3 last:");
writeln( a );
writeln( "dimIter( 2, (2,_,2) )");
for i in dimIterExample( a, iter2, 2, (2,0,2) ) {
  writeln( i );
}

mergeSort( a, comparator=iter3 ); // note this is the normal lexicographic order
writeln("Sorted on rank 3 last:");
writeln( a );
writeln( "dimIter( 3, (2,2,_) )");
for i in dimIterExample( a, iter3, 3, (2,2,0) ) {
  writeln( i );
}
writeln( "dimIter( 3, (1,1,_) )");
for i in dimIterExample( a, iter3, 3, (1,1,0) ) {
  writeln( i );
}
ian-bertolacci commented 6 years ago

I'm amending this write-up. The hard method is actually harder, because we need to consider the array's built from. There are several ways about this:

  1. Reordering the data in all the arrays to match the order of the domain (a terrible idea).
  2. Maintaining a separate list of indices that can be maintained in any order (not great)
  3. Attaching a position attribute to the index (eg ( sparse index, array index)) that is used internally to locate map that index to the array element (not the worst, but it adds overhead to adding and removing indices from the domain).

Because of this, I am more partial to the easier method.

bradcray commented 6 years ago

Some quick notes:

(or scalar if rank==2)

Should this read "== 1"?

ian-bertolacci commented 6 years ago

@e-kayrakli @buddha314 Thoughts on what you want to see or how you might use this? Any other users we should be asking?

buddha314 commented 6 years ago

I am way behind on this issue, to be honest. However, my primary use case is to iterate over columns or rows of sparse matrices, skipping the undefined bits. Often I use that in some kind of summary like max or standard deviation. I hope that helps. Oh, also used in conjunction with a some qualifier like 'where value > 3` or the like.

ian-bertolacci commented 6 years ago

2 questions:

  1. What domain types are you using this on?
  2. Do you ever iterate over the 'inefficient' domain? Do you know which domain is 'inefficient'?
buddha314 commented 6 years ago

real matrices are more common than integers, expect in the deep learning space... they are fans of "one-hot" vectors and the concatenation thereof. So perhaps a whole class of binary matrices would be interesting.

Yes, when I have a matrix, I'm often thinking in terms of graphs (so the entries are edges) or transition probabilities. In the latter case, you end up in a Minimax like situation where you are going back and forth over column and row maxes.