chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

About set operations on Ranges #6298

Open marcoscleison opened 7 years ago

marcoscleison commented 7 years ago

Hi,

I am working on matrix element mapping operations for a specific algorithm. I need to use some set operation on ranges. I am missing the Union and Difference set operations on Chapel Ranges. Would these operations be appropriate for ranges? Thanks.

bradcray commented 7 years ago

Hi Marcos --

I think the challenge to supporting such operations is that they are not closed over ranges (by Chapel's definition of such), at least in general.

In more detail, thinking of ranges as being defined by four integers <low bound, high bound, stride, alignment>, it is not possible to store the union of 1..10 by 3 and 2..10 by 3 in that format since the pattern would be two consecutive integers followed by a skip, then two more integers, a skip and so on. Similarly, 1..10 minus 1..10 by 3 can't be represented as a range in Chapel because it results in a similarly jagged pattern.

Unbounded ranges yield similar challenges -- e.g., the union of ..-3 and 5.. could not be stored as a range in Chapel.

If you restrict yourself to bounded, unstrided ranges, though, these operations should be closed and reasonable to define (right?). And even in the case of unbounded and strided regions, some values would work and others would not, so you could make it a runtime error if the result could not be represented as a range.

At the moment I don't have an opinion as to whether it would be better to support cases that could be supported by default given that they apply to some ranges, but not all; or whether they should be available in an optional library that the user could 'use'; or whether users should implement themselves when needed so that they can make decisions about what to do in the cases that are not closed.

ben-albrecht commented 7 years ago

I've found myself wanting this from time to time as well, but @bradcray makes some good points on why it's difficult to support.

Maybe associative domains are more appropriate for algorithms that require set operations between sets of indices.

marcoscleison commented 7 years ago

Thank you for the fast response. Indeed @bradcray explained the difficulties related to Ranges due to its internal structure. I understand it now. He makes some good points.

I think that ranges containing non-regular gaps could be represented internally if instead having a tuple <low bound, high bound, stride, alignment> to represent the range, it could have a tuple <low bound, high bound, stride, alignment, link_to_the_next_tuple>, in other words, the range could be represented internally as an ordered linked list with the ranges and gaps information. However, this suggestion seems not easily applied to union operation of ranges with strides different than 1. As mentioned @ben-albrecht , I know how hard is to provide set operations on range.

bradcray commented 7 years ago

I don't think we want to make the language's notion of range any more generak than "a sequence of integers with a regular stride" since ranges are meant to be a reasonably primitive, efficient, and simple thing.

That said, I could imagine a library providing some sort of "multi-range" or "jagged range" or "..." type which was more complicated and supported the types of multi-range patterns that you're conceiving of here. But I think there will be challenges in implementing such a concept efficiently. For example, when iterating over the new type, would the iterator guarantee that integers would be accessed in sequential order? If so, it would need to essentially do some sort of merge or sort on all of the component ranges before starting the iteration which would add overhead. Alternatively, the type could make no guarantees about ordering and just iterate over each sub-range separately -- though presumably it would still need to do some sort of intersection in order to make sure it didn't yield the same index multiple times (?).

If you have practical use cases that want these sorts of patterns, I'd be curious to hear about them. One that shows up in our work is iteration over block-cyclic patterns where you want to iterate over a dense block (e.g., 1..4) and then skip to the next dense block (e.g., 13..16) -- but we usually represent these as a nested loop where the outer loop's range represents the series of blocks and the inner loop's range represents the indices within each block.