Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

backend(OpenMP target NVPTX): Cannot select: 0x6c60f50: i16,ch = AtomicCmpSwap #43189

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR44219
Status NEW
Importance P normal
Reported by Jeffrey Kelling (gcc.j.kelling@hzdr.de)
Reported on 2019-12-04 08:11:58 -0800
Last modified on 2021-10-06 10:46:45 -0700
Version 9.0
Hardware PC Linux
CC hfinkel@anl.gov, jdoerfert@anl.gov, jonathanchesterfield@gmail.com, llvm-bugs@lists.llvm.org, tra@google.com
Fixed by commit(s)
Attachments temps.txz (327632 bytes, application/x-xz-compressed-tar)
Blocks
Blocked by
See also
Created attachment 22889
Temporary files from clang++ (host: preprocessed source .ii, assembly .s;
nvptx: preprocessed source .ii))

Internal error trying to compile C++11 code with OpenMP target-offload for
NVPTX backend, employing #omp atomic directives

Command-line:
clang++  -DALPAKA_ACC_CPU_BT_OMP4_ENABLED -DALPAKA_DEBUG=0 -DBOOST_ALL_NO_LIB -
DCATCH_CONFIG_FAST_COMPILE -I/home/yn622878/checkout/alpaka/test/common/include
-I/home/yn622878/checkout/alpaka/include -isystem
/home/yn622878/checkout/spack/opt/spack/linux-centos7-broadwell/gcc-9.2.0/boost-
1.70.0-dkeb472u6a3pgudl5j4fqw7huosyngp4/include -isystem
/home/yn622878/checkout/alpaka/test/../thirdParty/catch2/include  -fopenmp -
fopenmp-targets=nvptx64-nvidia-cuda  -O2 -ftemplate-backtrace-limit=0  --save-
temps   -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-disabled-
macro-expansion -Wno-global-constructors -Wno-padded -Wno-extra-semi-stmt -
fopenmp=libomp -std=gnu++11 -o CMakeFiles/atomic.dir/src/AtomicTest.cpp.o -c
/home/yn622878/checkout/alpaka/test/unit/atomic/src/AtomicTest.cpp

Relevant openmp flags are:
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda

The code compiles without error when these are replaced by
-fopenmp -fopenmp=libomp -fopenmp-targets=x86_64-pc-linux-gnu
(offloading to x86 host).

Error message form clang:
fatal error: error in backend: Cannot select: 0x6c60f50: i16,ch =
AtomicCmpSwap<(load store monotonic monotonic 1 on %ir.135)> 0x6314438,
0x6c60ba8, 0x70cd828, 0x6c60ad8
  0x6c60ba8: i64,ch = CopyFromReg 0x6314438, Register:i64 %36
    0x70cdea8: i64 = Register %36
  0x70cd828: i16,ch = CopyFromReg 0x6314438, Register:i16 %54
    0x6a23440: i16 = Register %54
  0x6c60ad8: i16 = add 0x70cd828, 0x768ec50
    0x70cd828: i16,ch = CopyFromReg 0x6314438, Register:i16 %54
      0x6a23440: i16 = Register %54
    0x768ec50: i16,ch = CopyFromReg 0x6314438, Register:i16 %52
      0x711ec68: i16 = Register %52
In function:
_Z13testAtomicAddIN6alpaka3acc10AccCpuOmp4INSt3__117integral_constantImLm1EEEmEEhEvRKT_PbT0_
clang-9: error: clang frontend command failed with exit code 70 (use -v to see
invocation)
clang version 9.0.0 (tags/RELEASE_900/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /rwthfs/rz/SW/UTIL/clang/9.0.0/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to
https://bugs.llvm.org/ and include the crash backtrace, preprocessed source,
and associated run script.
clang-9: note: diagnostic msg: Error generating preprocessed source(s)
Quuxplusone commented 4 years ago

Attached temps.txz (327632 bytes, application/x-xz-compressed-tar): Temporary files from clang++ (host: preprocessed source .ii, assembly .s; nvptx: preprocessed source .ii))

Quuxplusone commented 4 years ago

Cannot select means a missing pattern in dagtodag. Provided nvptx can do atomic cmpswap on i16 this should be a simple fix. I don't know the ISA so am reluctant to volunteer to fix.

Quuxplusone commented 3 years ago
There are two sides to this issue:

1. CUDA 11 supports CAS int16:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomiccas

2. No other operation is supported for int16 and no atomics are supported on
int8. It may be, that this same ICE would be hit for any of these cases too.
There should be an error message stating, that the atomic in question is not
supported on this type by the backend.
Quuxplusone commented 3 years ago

@Artem, is this known to be missing in our backend or is there more to it?

Quuxplusone commented 3 years ago

@Artem, is this known to be missing in our backend or is there more to it?

Most likely. There wasn't much work done to bring in support for the new instructions (or variants of the existing ones) added by recent CUDA versions. We've recently added mbarrier and cp_async, but that's about it.

Support for atomics in NVPTX back-end in general is known to have holes. I believe there's a bug open for it already, but I can't find it at the moment.