ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

Cherry-pick RDNA1 fix into 6.1 release #1916

Closed GZGavinZhao closed 6 months ago

GZGavinZhao commented 6 months ago

This PR requests to cherry-pick #1897 into ROCm 6.1 release.

1897 enables RDNA1 (gfx101*) users to run rocBLAS with the rocblas package from AMD's official repository. Since rocBLAS is almost ubiquitous in ML workflows, this means that all RDNA1 users are essentially unable to run any programs related to ML right now (ROCm/ROCm#2527).

Admittedly, RDNA1 GPUs are not officially supported, but this is such a small change that would benefit all users of RDNA1 GPUs, so I believe is very worth it.

This is a NFC for all architectures besides gfx1010.

1897 requires #1888 to function properly, but it seems like #1888 is already cherry-picked into the release/rocm-rel-6.1 branch in #1905, so cherry-picking #1897 would not break anything.

nakajee commented 6 months ago

Rocm6.1 release branch is already finalized and no further release plan for 6.1. This will be included in rocm 6.2.

GZGavinZhao commented 6 months ago

@nakajee Thank you for your quick response. Would you mind if I create a PR to release-staging/rocm-rel-6.2, or will someone from AMD cherry-pick this internally?

GZGavinZhao commented 6 months ago

Nevermind, I see this is already included in release-staging/rocm-rel-6.2. Thank you!

cgmb commented 6 months ago

Rocm6.1 release branch is already finalized and no further release plan for 6.1.

To be clear, there is a ROCm 6.1.1 release planned. Though, perhaps there are no changes planned for Tensile specifically.

@GZGavinZhao, it is not a simple process to cherry-pick a change into a patch release. There is an internal change management process for cherry-picks. Unfortunately, it is mostly handled manually and it is very time-consuming. And even a small change in Tensile requires retesting a large suite of libraries and applications for performance and correctness.

This is a great PR and it highlights some of the difficulties with the existing cherry-pick process. I'm hopeful we can improve to the point that a PR like this could be accepted in the future.

GZGavinZhao commented 6 months ago

@cgmb I see. I understand that the cherry-picking process may require extensive testing, hence this PR is more of a request to cherry-pick. I thought that maybe some Tensile changes are planned for the next patch release, so if this change can be considered together that'd be great, but releasing in ROCm 6.2 is totally fine as well. Anyway, Thank you so much for your explanation!