Closed fwyzard closed 5 months ago
I am just curious on the purpose of this API. Is the main goal to avoid the heap allocation of auto getWarpSizes(TDev const& dev) -> std::vector<std::size_t>
?
Because we could just change the API to either return e.g. a boost::small_vector
or just cache the warp sizes in the device and return a std::span
. The latter assumes that a device does not change its warp size during program execution.
@fwyzard if you want the PR merged, please mark the PR as Ready for review, thx!
Thanks for the review.
I've marked it as a draft because I want to figure out first how it interacts with caching the device information.
alpaka::getPreferredWarpSize(dev)
returns one of the possible warp sizes supported by the device.On devices that support a single work size (cpu, CUDA gpu, ROCm gpu),
getPreferredWarpSize(dev)
avoids the overhead of wrapping that value in anstd::vector
.On devices that support multiple warp sizes, the value returned by
getPreferredWarpSize(dev)
is unspecified. Currently it returns the largest supported value -- but this could change in a future version of alpaka.Add a test for
alpaka::getPreferredWarpSize(dev)
.