alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
337 stars 69 forks source link

Add `alpaka::getPreferredWarpSize(dev)` #2216

Closed fwyzard closed 5 months ago

fwyzard commented 5 months ago

alpaka::getPreferredWarpSize(dev) returns one of the possible warp sizes supported by the device.

On devices that support a single work size (cpu, CUDA gpu, ROCm gpu), getPreferredWarpSize(dev) avoids the overhead of wrapping that value in an std::vector.

On devices that support multiple warp sizes, the value returned by getPreferredWarpSize(dev) is unspecified. Currently it returns the largest supported value -- but this could change in a future version of alpaka.

Add a test for alpaka::getPreferredWarpSize(dev).

bernhardmgruber commented 5 months ago

I am just curious on the purpose of this API. Is the main goal to avoid the heap allocation of auto getWarpSizes(TDev const& dev) -> std::vector<std::size_t>?

Because we could just change the API to either return e.g. a boost::small_vector or just cache the warp sizes in the device and return a std::span. The latter assumes that a device does not change its warp size during program execution.

bernhardmgruber commented 5 months ago

@fwyzard if you want the PR merged, please mark the PR as Ready for review, thx!

fwyzard commented 5 months ago

Thanks for the review.

I've marked it as a draft because I want to figure out first how it interacts with caching the device information.