chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 422 forks source link

Propose not supporting first and last methods on sparse domains #17327

Open mstrout opened 3 years ago

mstrout commented 3 years ago

Currently for domains, D.first and D.last, will return the first and last index in a domain respectively based on the order the indices/coordinates are stored in. This means for a CSR or CSC sparse ordering in LayoutCS, it will return the index/coordinate for the first non-zero in the index array.

The main question is whether we should continue to support this for sparse domains. If we decide not to then we would need to do the following:

@bradcray thoughts about this: I’d be inclined to not support .first and .last on sparse domains, at least until we have more discussion about them. Starting with some equivalent questions: “Should a for loop on a dense 2D domain/array always iterate over the indices/elements in the same order (“to make the science of a code independent of the data structure order”), or should a domain map be able to define its own order (“for efficiency”) such that a RMO array would return in RMO, a CMO array in CMO, and an array using tiled memory or a space-filling curve would reveal that order. And similarly, if the 2D array is printed out, should that printout be independent of its memory layout, or reflect it? In practice, we’ve usually said “the same order” for productivity and postulated that there could be a .naturalOrder() iterator method if the user didn’t care about the order and wanted whatever serial ordering was faster.

The .first/.last / serial iteration order question also applies to sparse domains where, to me, it’s not clear how an algorithm could rely on these queries if they reflected the layout in memory (i.e., this may be the first index, but that doesn’t mean there aren’t other indices that aren’t lower/higher). That said, it’s not that clear to me how an algorithm would benefit from these queries even if they were well-defined. Where for a dense domain/array, they represent extreme indices regardless of serial iteration order, for a sparse domain/array, they don’t imply that there isn’t another index with a lower/higher row/column number, so how would I make use of them effectively?

@mstrout thoughts: I would also be inclined to not support .first and .last on sparse domains. The number of different orders is mind boggling especially when one considers the multi-dimensional generalization, sparse tensors.

vasslitvinov commented 3 years ago

I like the proposal of .naturalOrder() iterators. I consider this-iterators producing a predefined order as "least surprising" behavior, having the user opt-in to the "natural" order". To me this parallels high/low on ranges returning aligned bounds.

W.r.t. .first/.last, one route is to define them as compilation errors to avoid confusion. while letting each domain map provide more-specifically-named versions of these.