Closed jinz2014 closed 4 weeks ago
Can you share some more information, where are you seeing this error.
I hope this link will eventually let you add all missing functions related to conversions among float, half and bfloat16.
https://github.com/ROCm/HIPIFY/blob/amd-staging/docs/tables/CUDA_Device_API_supported_by_HIP.md
CUDA provided __float2bfloat162_rn
, however, hip doesn't have the corresponding definition. So when hipifying the code, it cannot compile.
error: use of undeclared identifier 'float2bfloat162_rn'; did you mean '__float22bfloat162_rn'? return float2bfloat162_rn(a);
// Following math functions included in ROCM6.2 SDK : // hmul: bfloat16 -> bfloat16 , hmul2: bfloat16 -> bfloat16, // floats2bfloat162_rn: (float,float) -> __hip_bfloat162, // float22bfloat162_rn: float2 -> hip_bfloat162, // __float2bfloat162_rn : float -> hip_bfloat162, // bfloat1622float2 : hip_bfloat162 -> float2
Here is an simple implementation of __floats2bfloat162_rn
:
__inline__ __device__ __nv_bfloat162 __floats2bfloat162_rn(const float a, const float b) {
__nv_bfloat162 val;
val = __nv_bfloat162(__float2bfloat16_rn(a), __float2bfloat16_rn(b));
return val;
}
it means accept two floats, and convert each of them to bf16 and pack it into 32bit number.
The signature '__float2bfloat162_rn' also implemented in hip, it accepts a single float:
__HOST_DEVICE__ inline __hip_bfloat162 __float2bfloat162_rn(const float a) {
return __hip_bfloat162{__float2bfloat16(a), __float2bfloat16(a)};
}
here is the implemenation:
/**
* \ingroup HIP_INTRINSIC_BFLOAT16_CONV
* \brief Converts float to bfloat16
*/
__HOST_DEVICE__ inline __hip_bfloat16 __float2bfloat16(float f) {
__hip_bfloat16 ret;
union {
float fp32;
unsigned int u32;
} u = {f};
if (~u.u32 & 0x7f800000) {
// When the exponent bits are not all 1s, then the value is zero, normal,
// or subnormal. We round the bfloat16 mantissa up by adding 0x7FFF, plus
// 1 if the least significant bit of the bfloat16 mantissa is 1 (odd).
// This causes the bfloat16's mantissa to be incremented by 1 if the 16
// least significant bits of the float mantissa are greater than 0x8000,
// or if they are equal to 0x8000 and the least significant bit of the
// bfloat16 mantissa is 1 (odd). This causes it to be rounded to even when
// the lower 16 bits are exactly 0x8000. If the bfloat16 mantissa already
// has the value 0x7f, then incrementing it causes it to become 0x00 and
// the exponent is incremented by one, which is the next higher FP value
// to the unrounded bfloat16 value. When the bfloat16 value is subnormal
// with an exponent of 0x00 and a mantissa of 0x7F, it may be rounded up
// to a normal value with an exponent of 0x01 and a mantissa of 0x00.
// When the bfloat16 value has an exponent of 0xFE and a mantissa of 0x7F,
// incrementing it causes it to become an exponent of 0xFF and a mantissa
// of 0x00, which is Inf, the next higher value to the unrounded value.
u.u32 += 0x7fff + ((u.u32 >> 16) & 1); // Round to nearest, round to even
} else if (u.u32 & 0xffff) {
// When all of the exponent bits are 1, the value is Inf or NaN.
// Inf is indicated by a zero mantissa. NaN is indicated by any nonzero
// mantissa bit. Quiet NaN is indicated by the most significant mantissa
// bit being 1. Signaling NaN is indicated by the most significant
// mantissa bit being 0 but some other bit(s) being 1. If any of the
// lower 16 bits of the mantissa are 1, we set the least significant bit
// of the bfloat16 mantissa, in order to preserve signaling NaN in case
// the bloat16's mantissa bits are all 0.
u.u32 |= 0x10000; // Preserve signaling NaN
}
ret.data = (u.u32 >> 16);
return ret;
}
@houseroad @jinz2014 please updated your SDK to rocm 6.2 using this script
Thanks for the answer.
error: use of undeclared identifier 'float2bfloat162_rn'; did you mean '__float22bfloat162_rn'? return float2bfloat162_rn(a);
Thanks