ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
340 stars 157 forks source link

[Question]: rocBLAS determinism with GPU atomics #1459

Closed 123epsilon closed 1 month ago

123epsilon commented 1 month ago

Hi, I'm using rocBLAS as a backend for hipBLAS and I wanted to know what the determinism guarantees are for GPU atomics? Specifically I am using it with PyTorch, and I notice that cuBLAS specifically mentions settings in their documentation that can be used to get bit-wise deterministic behavior:

provide a separate workspace for each used stream using the cublasSetWorkspace() function, or

have one cuBLAS handle per stream, or

use cublasLtMatmul() instead of GEMM-family of functions and provide user owned workspace, or

set a debug environment variable CUBLAS_WORKSPACE_CONFIG to :16:8 (may limit overall performance) or :4096:8 (will increase library footprint in GPU memory by approximately 24MiB).

I see in rocBLAS documentation that after rocBLAS 4.0, by default atomics are not utilized. Does that mean that I can safely assume that the use of atomics in rocBLAS < 4.0 is nondeterministic and that - if left in default settings that rocBLAS >= 4.0 will give deterministic behavior? Are there any other settings such as the workspace size that factor in to this?

rkamd commented 1 month ago

@123epsilon, Thanks for bringing it to our attention, we updated our documentation to clearly list the conditions in rocBLAS to obtain deterministic results. ( https://github.com/ROCm/rocBLAS/commit/42d65e162544b17157607cb643142b0682803e4f)

Atomic operations are enabled by default in current and previous releases of rocBLAS and functions using atomic operations may not provide deterministic results.

The documentation you are referring to is the changelog for rocBLAS 4.0. Our deprecation process involves notifying end users and removing the feature in the next major version change of ROCm release. In this case, we issued a deprecation notice in ROCm 6.0 and the actual change could occur in the next major version.

Bitwise Reproducibility section provides the conditions in which rocBLAS guarantees deterministic results.

In ROCm 6.2 and above users can use ROCBLAS_DEFAULT_ATOMICS_MODE environment variable to change the default atomic mode. For prior releases user must use rocBLAS rocblas_set_atomics_mode() API to change the default. [Refer to section on Atomic-operations]

123epsilon commented 1 month ago

I see, thank you!

rkamd commented 1 month ago

@123epsilon, If there are no additional questions, I will proceed to close this issue.

123epsilon commented 1 month ago

Yes that's all I needed to know - thank you!