Closed tbennun closed 9 months ago
Thank you @tbennun for this change. I am checking other places in rocBLAS with the same use of hipMemcpy that should be replaced with hipMemcpyAsync and hipStreamSynchronize.
Thank you! I searched for “DeviceToHost” and found some more non-test-related instances, but I was not sure if each case needs to be modified.
Thank you @tbennun for this contribution. Your PR is merged, and we are making another PR to change other occurrences of hipMemcpy.
Reductions (e.g. asum) with host-pointer mode enabled would use
hipMemcpy
to copy back the results. When using a nonblocking stream, the copy would execute out of order. This PR addresses this by callinghipMemcpyAsync
followed by a stream synchronization, which will guarantee that the value is available on the host.