Closed dmcdougall closed 1 year ago
This adds HIP ports for strided kernels and the parallel(Memcpy|Accumulate) functions.
parallel(Memcpy|Accumulate)
I have not checked correctness for the strided kernels, simply ported their cuda variants in order for the ga to link successfully.
I need ga to link successfully in order to diagnose a separate p2p correctness bug I am currently chasing.
This PR assumes that #296 will be merged first. That's why you see commit bd2bf08 here.
This adds HIP ports for strided kernels and the
parallel(Memcpy|Accumulate)
functions.I have not checked correctness for the strided kernels, simply ported their cuda variants in order for the ga to link successfully.
I need ga to link successfully in order to diagnose a separate p2p correctness bug I am currently chasing.