ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.09k stars 230 forks source link

Merge CK fwd mha FP16 solver #3308

Closed BrianHarrisonAMD closed 1 month ago

BrianHarrisonAMD commented 1 month ago

This PR will merge the changes from #3304 & #3277 into develop.

BrianHarrisonAMD commented 1 month ago

Looking into the CI issues for external CI / Windows.

BrianHarrisonAMD commented 1 month ago

Windows issue was due to the wrong define guard being used.

bghimireamd commented 1 month ago

LGTM

BrianHarrisonAMD commented 1 month ago

Restarting CI after code review comments.

junliume commented 1 month ago

Thanks @CAHEK7 for the comments. I am trying to move this info to a post in discussion and mark it as part of "MIOpen Developers' Guide" :)

hipMemcpy should not be used directly, actually non-async version must not be used ever.
himMemcpyAsync with proper current stream from the handle can be used, but should not be used directly.