StanfordLegion / legion.stanford.edu

Legion home page
Apache License 2.0
0 stars 7 forks source link

Reduction-related code and explanation in the circuit example is out-dated #12

Open Anjiang-Wei opened 1 year ago

Anjiang-Wei commented 1 year ago

The tutorial mentions several times the code related with the AccumulateCharge class, but the explanation does not seem to match the latest implementation of circuit example. I believe that the reduction operator has been integrated into Legion itself, e.g., the current version simply needs to use LEGION_REDOP_SUM_FLOAT32

I wish I could make a PR for this to improve the circuit tutorial to better match the current code of the circuit example and the overall Legion design, but currently, I have very little experience in using the reduction operator, so I decided to open an issue here for now.

Anjiang-Wei commented 1 year ago

I do not quite get the intuition of differences between reduction list and reduction fold. More specifically, the reason why people want to differentiate the two notions.

Anjiang-Wei commented 1 year ago

Also, I am curious whether scan is a "parallel primitive" that Legion supports or will support.

Anjiang-Wei commented 1 year ago

I do not quite get the intuition of differences between reduction list and reduction fold. More specifically, the reason why people want to differentiate the two notions.

I think I understand it better by reading the tutorial twice. Reduction list corresponds to reduction without fold. Also, intuitively I can understand that reduction+fold has the potential to enable more optimization than reduction itself. Is there a simple but real-world example where we can only use reduction without a fold? It would be great if we could make the tutorial easier to follow for beginners like me.

Also, I feel a bit confused about the following statements in the tutorial:

Reduction list instances perform best when reductions are sparse in the target logical region and the resulting list of reductions has fewer elements than the target logical region. Alternatively, fold reduction instances perform best for dense reductions where more than one reduction operation will be applied to each location in the logical region. Locally folding reductions saves space and allows reductions to be performed in parallel.

More specifically, what does more than one reduction operation mean? Does it correspond to the code where reduce_node is invoked twice inside the function cpu_base_impl?

lightsighter commented 1 year ago

I do not quite get the intuition of differences between reduction list and reduction fold. More specifically, the reason why people want to differentiate the two notions.

Section 7.1 of this paper covers the distinction. I will say the reduction list implementation is not really supported right now. I would need to resurrect it to bring it back so probably worth removing it from the manual if it is in there currently.

Reduction list corresponds to reduction without fold.

It does have that benefit as well, although the original motivation is the one above. Right now I think we assume that a fold function always exists on our reduction operators currently.

More specifically, what does more than one reduction operation mean? Does it correspond to the code where reduce_node is invoked twice inside the function cpu_base_impl?

See if the paper above answers you question. If not, let me know and I'll take a crack at answering it differently.