ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
336 stars 157 forks source link

Fix host-pointer-mode reductions for nonblocking streams #1369

Closed tbennun closed 9 months ago

tbennun commented 9 months ago

Reductions (e.g. asum) with host-pointer mode enabled would use hipMemcpy to copy back the results. When using a nonblocking stream, the copy would execute out of order. This PR addresses this by calling hipMemcpyAsync followed by a stream synchronization, which will guarantee that the value is available on the host.

amcamd commented 9 months ago

Thank you @tbennun for this change. I am checking other places in rocBLAS with the same use of hipMemcpy that should be replaced with hipMemcpyAsync and hipStreamSynchronize.

tbennun commented 9 months ago

Thank you! I searched for “DeviceToHost” and found some more non-test-related instances, but I was not sure if each case needs to be modified.

amcamd commented 9 months ago

Thank you @tbennun for this contribution. Your PR is merged, and we are making another PR to change other occurrences of hipMemcpy.