Open ben-albrecht opened 7 years ago
I have been working on Transpose recently and wanted to capture what is missing in the current implementation:
PRK specifications and the reference MPI1 implementation uses column-major arrays for both matrices and uses column-wise data decomposition. Then, the output array is accessed in column-major order where the input is accessed in row-major order. Current Transpose implementation in Chapel do things rather haphazardly in this context. Given that there is no native column-major layout in Chapel (yet?), I think arrays can be distributed with row-major decomposition and the access orders can be reversed (row-major on output array) to emulate something close to the reference implementation and the specs.
@ben-albrecht, looking at the issue again I think there are few things that can be added:
I don't think I can modify the original post, so you can interpret these however you wish and update it.
@e-kayrakli - Updated. Let me know if you see anything that could be updated further.
Sorry, I wasn't aware of the existence of this issue. FWIW, performance trend of transpose as of 1.17.1 can be found in #11031
Here is a meta-issue to track progress on the implementations of Intel's Parallel Research Kernels in Chapel.
Resources
General
contributed by
header comments, giving credit to authors and contributors. (#6405)--correctness
flag rather than--validate
for clarity (#6405)Implementations
Stencil
chpl__getPrivatizedCopy
#6184Transpose
Synch_p2p
DGEMM
DGEMM is distributed in its current state but it is not SUMMA. Note that the PRK specs does not specify an algorithm but MPI1 implementation is based on SUMMA.
Maintaining multiple implementations would be useful (see @e-kayrakli's comment below)
PIC
Sparse
NStream
AMR
A variation of Stencil that spawns subgrids to emulate adaptive mesh refinement
Branch
Very simple one that tests branch performance
Random
Reduce
Note: "Reduce" may be a misnomer as it seemingly does a element-wise vector addition where vectors are at specific parts of the memory.