ROCm / hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
https://github.com/RadeonOpenCompute/hcc/wiki
Other
428 stars 107 forks source link

SSE2 Intrinsics cause compiler error #976

Open dragontamer opened 5 years ago

dragontamer commented 5 years ago

I've got some simple SSE2 code here.

 #include <immintrin.h>

 int main(){
 // Uncomment the "#ifdef" to fix the issue
 //#ifdef __HCC_CPU__
     __m128i arb = _mm_undefined_si128();
     __m128i zero = _mm_xor_si128(arb, arb);

     return _mm_extract_epi64(zero, 0);
 //#endif
 }

I compile as:

hcc `hcc-config --cxxflags --ldflags` -msse2 example.cpp -o example

And it outputs the error:

example.cpp:4:19: error: always_inline function '_mm_undefined_si128' requires
      target feature 'sse2', but would be inlined into function 'main' that is
      compiled without support for 'sse2'
    __m128i arb = _mm_undefined_si128();
                  ^
example.cpp:5:20: error: always_inline function '_mm_xor_si128' requires target
      feature 'sse2', but would be inlined into function 'main' that is compiled
      without support for 'sse2'
    __m128i zero = _mm_xor_si128(arb, arb);
                   ^
example.cpp:7:12: error: '__builtin_ia32_vec_ext_v2di' needs target feature sse2
    return _mm_extract_epi64(zero, 0);
           ^
/opt/rocm/hcc/lib/clang/8.0.0/include/smmintrin.h:1097:14: note: expanded from
      macro '_mm_extract_epi64'
  (long long)__builtin_ia32_vec_ext_v2di((__v2di)(__m128i)(X), (int)(N))

I'm able to get the compiler to run with #ifdef __HCC_CPU__, but I doubt that is what is intended. My hcc version is as follows:

hcc --version
HCC clang version 8.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 6ec3c61e09fbb60373eaf5a40021eb862363ba2c) (ssh://gerritgit/lightning/ec/llvm ab3b88ffc2ae50f55361a49aec89f6e95d9d0ec4) (based on HCC 1.3.18482-757fb49-6ec3c61-ab3b88f )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/bin
david-salinas commented 5 years ago

This is actually expected behaviour. Those builtins are only available for intel targets. HCC is a single-source CPU/GPU compiler. So there are two passes; one for CPU (host) and one for GPU (device) targets. On the second pass, the target will be amdgcn, where these builtins are not supported.

dragontamer commented 5 years ago

This is actually expected behaviour. Those builtins are only available for intel targets. HCC is a single-source CPU/GPU compiler. So there are two passes; one for CPU (host) and one for GPU (device) targets. On the second pass, the target will be amdgcn, where these builtins are not supported.

Thanks for getting back to me on this.

I guess what I expected instead, was for only [[HC]] labeled functions to be compiled in the 2nd pass. I would expect that only a minority of code would be for GPUs (the minority which is called by other [[HC]] functions).

I guess, when I use HCC intrinsics or inline-assembly in a [[hc]] function, there's no compiler error. So I was hoping that x86 intrinsics could be used in a similar manner.

rrawther commented 4 years ago

I too am getting this error when compiling sse2 intrinsic. What is the solution when using hcc?