Closed Oblomov closed 2 months ago
can you share you g++ --version
and g++ -dumpmachine
Version:
g++ (Debian 13.2.0-23) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Machine:
x86_64-linux-gnu
This supports __has_attribute
(IIRC this was first introduced in GCC 5), but does not support ext_vector_type_attribute
, as easily verified by
printf '#if defined(__has_attribute)\n#pragma message("has __has_attribute")\n#if __has_attribute(ext_vector_type)\n#pragma message("has ext_vector_type attribute")\n#else\n#pragma message("NO ext_vector_type_attribute")\n#endif\n#endif\n' | g++ -x c++ -c - -o /dev/null
A possible alternative to the #ifdef
I mentioned, as a solution, would be to use something like __attribute__((vector_size (sizeof(T)*N)))
for GCC.
Hi @Oblomov, internal ticket has been created to investigate your issue. Thanks!
Hi @Oblomov thank you for reaching out! Is there a specific use case you have in mind that would require g++ instead of clang? The solution you provided would work, but it would treat the vectors as simple vectors instead of SIMD vectors, which can result in significant performance gaps. Hence, it is hard for us to justify supporting g++ at all.
Hello @tcgu-amd, thanks for getting back to me on this issue.
I first came across the issue while adding support for AMD HIP to GPUSPH (you can find some of the preliminary work in that direction in the spheric2022
branch; there has been a lot of work done since, but the branch hasn't been published yet; I can give you access to the private repository where the development is ongoing if necessary; at the moment, we're forced to ship a patched version of the vector math header because of these and other issues).
GPUSH is a very complex project with several optional dependencies that aren't always well-supported by device compilers, so the host and device side of the software are normally built with separate compilers (the default host compiler, which is usually g++
, is normally used for the host part, and nvcc
or hipcc
for the device part, although both these options can be overridden).
Some vector types are used on host too (even if not for computationally intensive operations), so the header needs to compile successfully regardless of compiler usage. I would argue that this should be the case regardless of my specific use case, and even if this leads to suboptimal performance on host. I would like to stress that I'm referring here specifically to the host side, since the device side would still be compiled with clang-based hipcc and thus still use the ext_vector_type
definition.
That being said, I have just run some simple tests and gcc
does autovectorize the loops, thereby treating the vector types as SIMD types. This is probably due to the extrema being constexpr on an array of correctly-aligned data.
A trivial example is this:
#include <hip/hip_vector_types.h>
#include <iostream>
__attribute__((noinline))
float4 doit(float4 const& v1, float4 const& v2)
{
return v1*v2;
}
int main(int argc, char* argv[]) {
const float c1 = argc > 1 ? atof(argv[1]) : 1.0f;
const float c2 = argc > 2 ? atof(argv[2]) : 2.0f;
const float c3 = argc > 3 ? atof(argv[3]) : 3.0f;
const float c4 = argc > 4 ? atof(argv[4]) : 4.0f;
const float4 v1{c1, c2, c3, c4};
const float4 v2{c4, c3, c2, c1};
float4 mul = doit(v1,v2);
std::cout << mul.x << " " << mul.y << " " << mul.z << " " << mul.w << std::endl;
}
with the header patched as suggested in the issue description. Looking at the disassembly of the doit
function, it is something like:
00000000000013e0 <_Z4doitRK15HIP_vector_typeIfLj4EES2_>:
13e0: 66 0f 6f 07 movdqa (%rdi),%xmm0
13e4: 0f 28 16 movaps (%rsi),%xmm2
13e7: 0f 59 d0 mulps %xmm0,%xmm2
13ea: 0f 29 54 24 e8 movaps %xmm2,-0x18(%rsp)
13ef: f3 0f 7e 4c 24 f0 movq -0x10(%rsp),%xmm1
13f5: f3 0f 7e 44 24 e8 movq -0x18(%rsp),%xmm0
13fb: c3 ret
13fc: 0f 1f 40 00 nopl 0x0(%rax)
@Oblomov Thanks for adding the context! I was not aware that gcc autovectorizes loops; this is neat. I will create an internal ticket to investigate adding this patch to the source code.
@Oblomov Good news! Seems like the functionality you suggested is already implemented! It was part of this commit https://github.com/ROCm/clr/commit/d4799b2a3f07d61d05ca7c4267a92b83b4421b06 and has been present since rocm 6.1.x https://github.com/ROCm/clr/blob/rocm-6.1.x/hipamd/include/hip/amd_detail/amd_hip_vector_types.h#L553. I just ran your reproducer on 6.2 and it was working. Would you mind trying on 6.2 (or 6.1) and let us know the result? Thanks!
Hello @tcgu-amd thank you very much, I have tested 6.1.0 and 6.2.1 and the issue seems to be fixed indeed. Thank you very much, I'm closing the issue since I consider it solved for me.
Problem Description
The standard math operators don't work on vector types as defined by
amd_hip_vector_types.h
when usingg++
as host compiler.Operating System
Debian trixie/sid
CPU
AMD Ryzen 7 PRO 6850H with Radeon Graphics
GPU
AMD Instinct MI300X, AMD Instinct MI300A, AMD Instinct MI250X, AMD Instinct MI250, AMD Instinct MI210, AMD Instinct MI100, AMD Radeon Pro W7900, AMD Radeon Pro W6800, AMD Radeon Pro V620, AMD Radeon Pro VII, AMD Radeon RX 7900 XTX, AMD Radeon RX 7900 XT, AMD Radeon VII
ROCm Version
ROCm 6.0.0
ROCm Component
HIP
Steps to Reproduce
A simple test case reproducing the issue is:
This builds correctly with
hipcc
orclang++
, but not withg++
, where the following error is shown instead:(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
This is most likely due to the way
__NATIVE_VECTOR__
gets defined, which in this case becomesThe operators need to be redefined to something like
and similarly for all other operators.
Some notes:
this issue has been present since HIP version 4.2 at least and is still present in 6.0.3
I had previously submitted a patch to fix this and a number of other issues to the old
ROCm/hipamd
repository.this is independent from the specific GPU, it's a header-level issue.