Open vasslitvinov opened 6 years ago
FWIW, I would probably prioritize the design how we'll do partial reductions of anonymous expressions like [(i,j) in D] i * 10.0 + j
(rather than simply arrays) over performance tuning the current approach. I don't doubt that we can get the performance of the current approach where we need it, but if we have to change the approach to handle non-array cases, then it doesn't make sense to put too much effort into tuning the current approach just yet.
Right now partial reductions are ~4x slower than hand-written code on cg-sparse.chpl for Class A.
This task is to investigate and close this gap as much as possible.