Add HIP Performance Guidelines

Melirius commented 2 months ago

BTW, it should be mentioned somewhere that to fully utilize all SIMD lines/possible threads in the block x-block size should be a multiple of warp size.

MKKnorr commented 2 months ago

BTW, it should be mentioned somewhere that to fully utilize all SIMD lines/possible threads in the block x-block size should be a multiple of warp size.

When being pedantic: the block size (x*y*z) has to be a multiple of the warp size for full utilization

matyas-streamhpc commented 2 months ago

Most of these sections are very close to the performance guidelines of the cuda programming guide, sometimes almost quoting it directly. I don't think that's a good practice, especially as some parts don't apply to AMDs GPUs at all, and on top of that the cuda programmign guide does not have a permissive license from what I can tell

A better place for inspiration might be gpuopen, that already has some performance guides for e.g. rdna https://gpuopen.com/learn/rdna-performance-guide/

I am not sure that is best strategy either, but the concept was accepted as a first version. It is not quoting directly the mentioned document, but there is overlap in the content. Personally, I would have appreciated every pieces of recommendation, both in format and in content.

Nonetheless, we always have the opportunity to improve it and make the documentation better for the satisfaction of the developers.

neon60 commented 1 month ago

Most of this PR changes has been merged in, while the leftover is here: #3483

ROCm / HIP

Add HIP Performance Guidelines #3455