Closed s-sajid-ali closed 1 year ago
This is really cool. Thank you. Give me a few days to test it.
Results look fantastic. I have no issues building on Apple M1, Tiger Lake (Ubuntu 22) or Orin (Ubuntu 20).
Thanks for the contribution. If you want to do stencil, it's probably about the same work as transpose or dgemm. I don't know enough about Rayon but p2p should be feasible using either the task or hyperplace design shown in C1z (or Cxx11).
New PRK implementation checklist
Which kernels are implemented?
Documentation and build examples
Added relevant dependencies to
Cargo.toml
files, which will promptcargo
to fetch and build relevant dependencies.Do you certify that your contribution is made in good faith and does not attempt to introduce any negative behavior into this project?
Additional Changes
Fixed a minor issue with
nstream-kokkos.cc
to account for changes introduced as part of the3.7.00
release.Overview of performance from the new kernels on an M1-max MacBookPro:
nstream
:All results obtained with using 10 iterations over 64million elements.
transpose
:All results obtained with using 10 iterations over matrix order 16384, tilesize of 32.
dgemm