Open ahysing opened 3 years ago
@ahysing — This is a quick note to mention that the Chapel team at HPE is on holiday through Jan 4, so won't be able to be as responsive as normal until then. Talk to you more in 2021.
https://github.com/chapel-lang/chapel/issues/14281 is related in the sense that it proposes different ways of implementing IRV or removing it (that's where I stood, but as mentioned there a use case can change my mind)
@ahysing -- I have couple of questions to understand your use case a bit better:
param
). Your code is again doing the former, but I just wanted to be sure that it is not doing that only for demonstration purposes.Thanks!
@ahysing: I agree that distributed sparse arrays should support the ability to set the IRV like local sparse arrays, so don't object at all to what this issue is requesting or noting.
However, since you've given some context for your use case: I worry that using a sparse array to implement a distributed heap is going to be prohibitively expensive. My mind goes to using a reduction to find min/max values (if a small number are required), or simply sorting the data and reading out the prefix. Is that an option for your use case, or is there something about the nature of the data that's too dynamic to support it?
If the latter, then my head goes to wondering what creating a native distributed heap would look like and whether we should support that in a library instead... E.g., maybe a heap per locale with a summary heap of the "tops" of all the heaps living on one locale?
I agree with Brad that sparse arrays may not be great for implementing a heap.
If the latter, then my head goes to wondering what creating a native distributed heap would look like and whether we should support that in a library instead... E.g., maybe a heap per locale with a summary heap of the "tops" of all the heaps living on one locale?
This is a very interesting idea.
I don't want to veer the issue too much off-topic, but you'd want to keep your heaps balanced across the locales and if you are building it from a large data set in one go, you'd probably want to round-robin them in bulk. Internally, the heap structure uses treap as the underlying data structure, which is expected to be balanced. So, as long as you keep individual heaps' sizes balanced the overall structure should be balanced.
Hi again . I will answer your questions as best I can.
@e-kayrakli
In the example above I expect to set up IRV once, and share that IRV-value across all locales. That way the array would look and feel like an array on a single locale (except performance of course). This would feel natural to newcomers to chapel.
I could imagine setting IRV to a param
literal which never change.
reading up on #14281
@bradcray suggested achieve the same with a reduction over arrays, and in fact this works also. It is a great suggestion which the option I am going for. thanks :)
To give you some context.
I am implementing A* search. I do it as an thought exercise, and to learn your exciting language. In this this algorithm one can imagine implementing f(x) and g(x) as large distributed sparse arrays with IRV = Infinity.
The heap I am referring to is called openSet
on the middle of wikipedia's psudo code.
Mostly to remind myself of the issues I poked around a bit and wanted to record why this is not working today and why it is not a quick fix:
Our non-distributed sparse arrays inherit from BaseSparseArrImpl
which stores irv
as a field, and provides an proc IRV ref
that returns it. A distributed sparse array consists of such non-distributed sparse arrays. And currently it inherits from a base sparse array type (BaseSparseArr
) that is higher in the class hierarchy. This class doesn't have irv
field.
The reason for that is because we don't know how to manage IRV for sparse block array. The current scheme that works for non-distributed sparse arrays doesn't work because we need to know when the IRV for the distributed array is changed so that we can propagate the new value to the non-distributed units in each target locale. I can think of some alternative designs to workaround that:
mySparseArr.IRV = 5
, and would need mySparseArr.setIRV(5)
. I think setting IRV is not a common operation and this wouldn't feel too ugly.2 would still have some additional checks on remote access. This is probably not an issue today, but may be in the future when sparse arrays are optimized. 3 is a clean implementation with no performance issues that I can think of, but it is a breaking change (not that sparse interface is stable but still...) and requires a design discussion. 1 can be a considered as a correctness bandaid, I don't think it is what we want in the long term.
I could imagine setting IRV to a param literal which never change.
While this might be attractive for sparse numerical arrays, it's problematic for sparse arrays of types that don't currently support param values (most notably, records, classes, or arrays).
Of Engin's three proposals, I find option 3 most attractive.
While this might be attractive for sparse numerical arrays, it's problematic for sparse arrays of types that don't currently support param values (most notably, records, classes, or arrays).
Ah, yeah, good point. I always forget about that case.
I forked off https://github.com/chapel-lang/chapel/issues/16898 as a design discussion for approach 3.
We should keep this open as an unimplemented feature in the language. If the feature is needed in some use case, I have the following version of the snippet in the OP that works around the issue by playing with the internal implementation of the sparse array. Obviously, I haven't tested it thoroughly :)
use BlockDist;
proc main() {
const start : int(64) = 0;
const end : int(64) = 32;
const bbox = {start..end};
const ALL : domain(1) dmapped Block(boundingBox=bbox) = bbox;
var D : sparse subdomain(ALL);
D.add(0);
D.add(10);
var values : [D] real;
//values.IRV = 5.0;
setIRV(values, 5.0);
writeln(values[0]); // prints 0.0
writeln(values[1]); // prints 5.0
}
proc setIRV(arr, irv) {
const ref inst = arr._value;
coforall l in inst.dom.dist.targetLocales do on l {
inst.myLocArr!.myElems.IRV = irv;
}
}
👍
I am looking for an alternative for getting the bottom-most value in a large dataset. Usually a min-heap would be the perfect alternative. Chapel has one of those in module heap. However Heap seems to be living in a single locale. When the data set is big enough the heap data structure will outgrow the size of memory in one locale, and eventually crash.
Sparse domains and arrays seems to overcome the data set size issue. If one applied a sparse subdomain over a dmapped Block one could imagine a data structucture which can grow almost indefinatly. However assigning the Impliclitly Replicated Value causes an error over blocks.
Summary of Problem
Define a demapped block. Assign the IRV. Read the crash report
Steps to Reproduce
Source Code:
One expects this program to compile without errors. The resulting program contains
array
with values 5.0 as IRV and 0.0 in position 0 and 10.compiling the program results in
Execution command:
chpl main.chpl
.Configuration Information
chpl --version
: