Starting from ROCm v5.7, the introduction of certain bfloat16 conversion functions in the header files has led to One Definition Rule (ODR) violations when building projects. This is due to some host functions not being specified as inline or static, resulting in linkage errors across multiple translation units.
Temporary Workaround
A temporary workaround involves manually modifying the header file to add the inline keyword to the __HOST_DEVICE__ macro definition. Specifically, changing line 96 in /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h to:
#define _HOST_DEVICE_ _host_ _device_ inline
This resolves the linkage issue but is not a sustainable solution.
Expected Behavior
The host functions should be defined with inline or static specifiers to prevent ODR violations and ensure that the header files can be safely included across multiple translation units without causing linkage errors.
Additional Context
It appears that there is an ongoing effort to fix this issue, as seen in the commit 86bd518981b364c138f9901b28a529899d8654f3. However, this fix does not seem to be included in any of the ROCm releases.
Users attempting to install vLLM on ROCm, specifically after vLLM-rocm is merged into the mainline vLLM, may encounter issues due to the aforementioned ODR violations. It would be beneficial for the community if such fixes were included in an official ROCm release to avoid the need for manual intervention and to ensure clean and maintainable codebases.
Operating System
Ubuntu 22.04.3
CPU
AMD EPYC 7763 64-Core Processor
GPU
AMD Instinct MI250, AMD Instinct MI210
ROCm Version
ROCm 6.0.0, ROCm 5.7.1
ROCm Component
No response
Steps to Reproduce
Build a project on ROCm that includes the bfloat16 conversion functions from the header file /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h.
Observe the ODR violations in the build process when linking multiple translations units including such header.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Problem Description
Starting from ROCm v5.7, the introduction of certain bfloat16 conversion functions in the header files has led to One Definition Rule (ODR) violations when building projects. This is due to some host functions not being specified as
inline
orstatic
, resulting in linkage errors across multiple translation units.Temporary Workaround
A temporary workaround involves manually modifying the header file to add the inline keyword to the
__HOST_DEVICE__
macro definition. Specifically, changing line96
in/opt/rocm/include/hip/amd_detail/amd_hip_bf16.h
to:This resolves the linkage issue but is not a sustainable solution.
Expected Behavior
The host functions should be defined with inline or static specifiers to prevent ODR violations and ensure that the header files can be safely included across multiple translation units without causing linkage errors.
Additional Context
It appears that there is an ongoing effort to fix this issue, as seen in the commit 86bd518981b364c138f9901b28a529899d8654f3. However, this fix does not seem to be included in any of the ROCm releases. Users attempting to install vLLM on ROCm, specifically after vLLM-rocm is merged into the mainline vLLM, may encounter issues due to the aforementioned ODR violations. It would be beneficial for the community if such fixes were included in an official ROCm release to avoid the need for manual intervention and to ensure clean and maintainable codebases.
Operating System
Ubuntu 22.04.3
CPU
AMD EPYC 7763 64-Core Processor
GPU
AMD Instinct MI250, AMD Instinct MI210
ROCm Version
ROCm 6.0.0, ROCm 5.7.1
ROCm Component
No response
Steps to Reproduce
/opt/rocm/include/hip/amd_detail/amd_hip_bf16.h
.(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response