Closed fileaccent closed 1 year ago
Hi @fileaccent thanks for reaching out!
Since you are attempting to integrate rocWMMA into another infrastructure, this would require some investigation as to how pytorch sets up the compiler and development environment. Pytorch appears to be ultimately responsible for setting targets flags and integrating other code. Modifying these settings would be in the configuration of your pytorch build.
That being said - the first line with the HIPCC call has the following:
-D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1
This is likely related to the compilation issue you are seeing about the conversion between __half and _Float16 types. Under the hood we support incoming __half data, but we require ability to convert to _Float16 which is native to AMD cards. If this is done in another library (which is entirely possible) then you need to expose those conversions.
What I would recommend is to check the pytorch documentation to see whether you can adjust your pytorch configuration to build only for your intended targets, and set the above flags to 0's. I'm sure the pytorch devs would also be accommodating to any question you may have.
Just FYI - looks like most of this configuration with the above is done in a 'gloo' cmake file within pytorch: pytorch/tools/amd_build/build_amd.py gloo/cmake/Dependencies.cmake gloo/cmake/Hip.cmake
Cheers,
--Chris
I solved this problem later and made a record: (The method is not particularly formal. If there are other solutions, you can also share them.)
In hip we can add the "--no-offload-arch" option to prevent compilation for some architectures. Here I block all architectures except gfx90a. Below is a setup.py code for reference.
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
from pathlib import Path
import os
workspace_dir = Path(os.path.dirname(os.path.abspath(__file__)))
setup(
name="fused_attn",
ext_modules=[
CUDAExtension(
name="fused_attn",
sources=[str(workspace_dir / "src" / "fused_attn_extention.cu")],
include_dirs=[str(workspace_dir / "include")],
extra_compile_args=[
"-O3",
"-std=c++20",
"-I/opt/rocm/include",
"-I/opt/rocm/hip/include",
"--no-offload-arch=gfx1030",
"--no-offload-arch=gfx900",
"--no-offload-arch=gfx906",
"--no-offload-arch=gfx908"
],
)
],
cmdclass={
"build_ext": BuildExtension
}
)
In this way, errors will not be reported due to compilation of unsupported architectures.
You may also encounter the following errors:
In file included from /data/zhaorong/code/fused-attention/src/fused_attn_extention.hip:4:
In file included from /data/zhaorong/code/fused-attention/include/fused_attn_hip.cuh:6:
In file included from /opt/rocm-5.4.0/include/rocwmma/rocwmma.hpp:31:
In file included from /opt/rocm-5.4.0/include/rocwmma/internal/io_config.hpp:29:
In file included from /opt/rocm-5.4.0/include/rocwmma/internal/broadcast.hpp:29:
In file included from /opt/rocm-5.4.0/include/rocwmma/internal/types.hpp:339:
/opt/rocm-5.4.0/include/rocwmma/internal/types_ext.hpp:328:40: error: no matching conversion for static_cast from 'const rocwmma::hfloat16_t' (aka 'const __half') to 'rocwmma::float16_t' (aka '_Float16')
return static_cast<hfloat16_t>(static_cast<float16_t>(x) * static_cast<float16_t>(y));
You need to modify the source file of rocwmma to solve this problem. Modify the three functions that reported errors to the following form.
__host__ inline hfloat16_t operator*(const hfloat16_t& x, const hfloat16_t& y)
{
float16_t mid1 = *(float16_t *)(void *)(&x);
float16_t mid2 = *(float16_t *)(void *)(&y);
mid1 = mid1 * mid2;
return *(hfloat16_t *)(void *)&mid1;
}
__host__ inline hfloat16_t operator+(const hfloat16_t& x, const hfloat16_t& y)
{
float16_t mid1 = *(float16_t *)(void *)(&x);
float16_t mid2 = *(float16_t *)(void *)(&y);
mid1 = mid1 + mid2;
return *(hfloat16_t *)(void *)&mid1;
}
__host__ inline hfloat16_t& operator+=(hfloat16_t& x, const hfloat16_t& y)
{
float16_t mid1 = *(float16_t *)(void *)(&x);
float16_t mid2 = *(float16_t *)(void *)(&y);
mid1 = mid1 + mid2;
return x = *(hfloat16_t *)(void *)&mid1;
}
Now you should install this pytorch extension. The above method is not formal. If there is a better method, please tell me. Thank you so much.
Hi @fileaccent, Happy to see that you've achieved a solution.
I will take this experience and see what we can do to make rocWMMA aware of the flag -DHIP_NO_HALF_OPERATORS=1 -DHIP_NO_HALF_CONVERSIONS=1
This way in the future the source modification shouldn't be necessary.
Thank you very much for your feedback
I want to be able to convert a cuda code containing wmma into hip. I have unit tests done and it works. I hope to integrate this code into pytorch. When I executed "python setup.py install", I found that all the architectures were added when the code was compiled, so the execution reported an error. Because rocwmma does not support gfx1030. What should I do to avoid this error? Can I just compile for a certain architecture?
This is the content of the setup.py file:
The following is part of the error report:(I specified the architecture, but pytorch still adds all architectures.)
Environment: rocm: 5.4 ubuntu: 20.04 python: 3.8 pytorch: 1.12.1 GPU: MI210 rocwmma-dev: 0.7.0.50400-72~20.04