Open Anjiang-Wei opened 1 year ago
I do not quite get the intuition of differences between reduction list
and reduction fold
. More specifically, the reason why people want to differentiate the two notions.
Also, I am curious whether scan
is a "parallel primitive" that Legion supports or will support.
I do not quite get the intuition of differences between
reduction list
andreduction fold
. More specifically, the reason why people want to differentiate the two notions.
I think I understand it better by reading the tutorial twice. Reduction list corresponds to reduction without fold. Also, intuitively I can understand that reduction+fold
has the potential to enable more optimization than reduction
itself. Is there a simple but real-world example where we can only use reduction
without a fold
? It would be great if we could make the tutorial easier to follow for beginners like me.
Also, I feel a bit confused about the following statements in the tutorial:
Reduction list instances perform best when reductions are sparse in the target logical region and the resulting list of reductions has fewer elements than the target logical region. Alternatively, fold reduction instances perform best for dense reductions where more than one reduction operation will be applied to each location in the logical region. Locally folding reductions saves space and allows reductions to be performed in parallel.
More specifically, what does more than one reduction operation
mean? Does it correspond to the code where reduce_node
is invoked twice inside the function cpu_base_impl
?
I do not quite get the intuition of differences between reduction list and reduction fold. More specifically, the reason why people want to differentiate the two notions.
Section 7.1 of this paper covers the distinction. I will say the reduction list implementation is not really supported right now. I would need to resurrect it to bring it back so probably worth removing it from the manual if it is in there currently.
Reduction list corresponds to reduction without fold.
It does have that benefit as well, although the original motivation is the one above. Right now I think we assume that a fold
function always exists on our reduction operators currently.
More specifically, what does more than one reduction operation mean? Does it correspond to the code where reduce_node is invoked twice inside the function cpu_base_impl?
See if the paper above answers you question. If not, let me know and I'll take a crack at answering it differently.
The tutorial mentions several times the code related with the
AccumulateCharge
class, but the explanation does not seem to match the latest implementation of circuit example. I believe that the reduction operator has been integrated into Legion itself, e.g., the current version simply needs to use LEGION_REDOP_SUM_FLOAT32I wish I could make a PR for this to improve the circuit tutorial to better match the current code of the circuit example and the overall Legion design, but currently, I have very little experience in using the reduction operator, so I decided to open an issue here for now.