Open jackzhou121 opened 1 year ago
@jackzhou121 ,
I had same issues, and resolved them using following patch. This code casts the value into a double, ensuring that the compiler always chooses double operands for the "=" operator.
--- a/src/fastertransformer/kernels/disentangled_attention_kernels.cu
+++ b/src/fastertransformer/kernels/disentangled_attention_kernels.cu
@@ -379,7 +379,7 @@ __global__ void disentangled_attention_kernel(TDataType* result,
#ifdef ENABLE_BF16
else if constexpr (std::is_same<TDataType, __nv_bfloat16>::value) {
// bf16
- res = __hadd(res0, __hadd(res1, T[threadIdx.x][ty + threadIdx.y]));
+ res = static_cast<double>(__hadd(res0, __hadd(res1, T[threadIdx.x][ty + threadIdx.y])));
}
#endif
Branch/Tag/Commit
v5.3
Docker Image Version
nvcr.io/nvidia/pytorch:22.12-py3
GPU name
T4
CUDA Driver
NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.8
Reproduced Steps