UoB-HPC / BabelStream

STREAM, for lots of devices written in many programming models
Other
313 stars 109 forks source link

More STD variants #115

Closed tom91136 closed 2 years ago

tom91136 commented 2 years ago

This PR adds the index-oriented STD implementation (std-indices), it's been verified to work with NVHPC 21.9 and the usual C++ compilers with TBB.

This PR also refactors the STD and STD20 implementation to std-data and std-ranges respectively.

Currently this is still pending validation with oneDPL but it's ready for high-level reviews; don't merge yet.

tom91136 commented 2 years ago

The validation with oneDPL turned out to be a be quite interesting.

OneDPL supports the use of std::vector with a custom SYCL USM allocator so in theory we can just use that and everything else can stay the same. It appears that doing this would incur a full copy of the data at the lambda boundary on any SYCL backend (it copies for CPUs as well), making everything very slow. This is quite surprising and probably warrants a bug report for oneDPL upstream.

The other CPU backends of oneDPL (TBB, OpenMP) doesn't have this issue and behaves similar to std::execution::par_unseq performance wise.

The alternative is to switch back to raw arrays (T* x = new T[]/delete[] a) since USM returns raw pointers as well. I've got a working implementation of this on the std-use-raw-ptr branch. With pointers, the copy is no longer present and performance is comparable to the native SYCL implementation. Diff here.

tom91136 commented 2 years ago

I've reverted the oneDPL commit for now (there's already a branch that uses pointers based on this branch so force pushing is probably gonna break stuff)