ROCm / clr

MIT License
85 stars 35 forks source link

[Issue]: ODR Violations Due to Missing inline Specifiers in bfloat16 Conversion Functions #43

Open tanpinsiang opened 5 months ago

tanpinsiang commented 5 months ago

Problem Description

Starting from ROCm v5.7, the introduction of certain bfloat16 conversion functions in the header files has led to One Definition Rule (ODR) violations when building projects. This is due to some host functions not being specified as inline or static, resulting in linkage errors across multiple translation units.

Temporary Workaround

A temporary workaround involves manually modifying the header file to add the inline keyword to the __HOST_DEVICE__ macro definition. Specifically, changing line 96 in /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h to:

#define _HOST_DEVICE_ _host_ _device_ inline

This resolves the linkage issue but is not a sustainable solution.

Expected Behavior

The host functions should be defined with inline or static specifiers to prevent ODR violations and ensure that the header files can be safely included across multiple translation units without causing linkage errors.

Additional Context

It appears that there is an ongoing effort to fix this issue, as seen in the commit 86bd518981b364c138f9901b28a529899d8654f3. However, this fix does not seem to be included in any of the ROCm releases. Users attempting to install vLLM on ROCm, specifically after vLLM-rocm is merged into the mainline vLLM, may encounter issues due to the aforementioned ODR violations. It would be beneficial for the community if such fixes were included in an official ROCm release to avoid the need for manual intervention and to ensure clean and maintainable codebases.

Operating System

Ubuntu 22.04.3

CPU

AMD EPYC 7763 64-Core Processor

GPU

AMD Instinct MI250, AMD Instinct MI210

ROCm Version

ROCm 6.0.0, ROCm 5.7.1

ROCm Component

No response

Steps to Reproduce

  1. Build a project on ROCm that includes the bfloat16 conversion functions from the header file /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h.
  2. Observe the ODR violations in the build process when linking multiple translations units including such header.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

cjatin commented 5 months ago

Its already added

https://github.com/ROCm/clr/blob/c398c75512f099227244a3d13e32d79be18fdce3/hipamd/include/hip/amd_detail/amd_hip_bf16.h#L102

Should reflect in upcoming release

tanpinsiang commented 5 months ago

Thanks for the update. Is there a possibility of having a maintenance release for ROCm-5.7 like ROCm-5.7.x?