LLNL / RAJA

RAJA Performance Portability Layer (C++)
BSD 3-Clause "New" or "Revised" License
484 stars 102 forks source link

RAJA build with xlC compiler and CUDA 11 fails #865

Open pelesh opened 4 years ago

pelesh commented 4 years ago

RAJA build on Power9/V100 arch fails when using xlC compiler with error message:

/.../src/RAJA/include/RAJA/util/OffsetLayout.hpp:154:13: error: dependent using declaration
      resolved to type without 'typename'
using Base::OffsetLayout;
            ^
/.../src/RAJA/include/RAJA/util/View.hpp:60:19: note: in instantiation of template class
      'RAJA::TypedOffsetLayout<TIL, camp::tuple<TIX> >' requested here
const layout_type layout; 
                  ^
/.../src/RAJA/test/unit/test-view.cpp:83:82: note: in instantiation of template class
      'RAJA::View<int, RAJA::TypedOffsetLayout<TIL, camp::tuple<TIX> >, int *>' requested here
RAJA::View< int, RAJA::TypedOffsetLayout< TIL, camp::tuple< TIX> > >  Dshift = D.shift({{N}}); 
                                                                                 ^
/.../src/RAJA/include/RAJA/util/OffsetLayout.hpp:125:8: note: target of using declaration
struct OffsetLayout : public internal::template OffsetLayout_impl< camp::make_idx_seq_t< n_dims> , IdxLin>  { 
       ^
/.../src/RAJA/include/RAJA/util/View.hpp:98:42: error: no matching constructor for
      initialization of 'typename add_offset<layout_type>::type' (aka 'TypedOffsetLayout<TIL, camp::tuple<TIX> >')
typename add_offset< layout_type> ::type shift_layout(layout); 
                                         ^            ~~~~~~
/.../src/RAJA/test/unit/test-view.cpp:83:82: note: in instantiation of function template
      specialization 'RAJA::View<int, RAJA::TypedLayout<TIL, camp::tuple<TIX>, -1>, int *>::shift<1, long>' requested
      here
RAJA::View< int, RAJA::TypedOffsetLayout< TIL, camp::tuple< TIX> > >  Dshift = D.shift({{N}}); 
                                                                                 ^
/.../src/RAJA/include/RAJA/util/OffsetLayout.hpp:146:8: note: candidate constructor
      (the implicit copy constructor) not viable: no known conversion from 'const layout_type' (aka 'const
      RAJA::TypedLayout<TIL, camp::tuple<TIX>, -1>') to 'const RAJA::TypedOffsetLayout<TIL, camp::tuple<TIX> >' for 1st
      argument
struct TypedOffsetLayout< IdxLin, camp::template tuple< DimTypes...> >  : public OffsetLayout< sizeof...(DimTyp...
       ^
/.../src/RAJA/include/RAJA/util/OffsetLayout.hpp:146:8: note: candidate constructor
      (the implicit move constructor) not viable: no known conversion from 'const layout_type' (aka 'const
      RAJA::TypedLayout<TIL, camp::tuple<TIX>, -1>') to 'RAJA::TypedOffsetLayout<TIL, camp::tuple<TIX> >' for 1st
      argument
struct TypedOffsetLayout< IdxLin, camp::template tuple< DimTypes...> >  : public OffsetLayout< sizeof...(DimTyp...
       ^
/.../src/RAJA/include/RAJA/util/OffsetLayout.hpp:146:8: note: candidate constructor
      (the implicit default constructor) not viable: requires 0 arguments, but 1 was provided
4 warnings and 2 errors generated.
Error while processing /tmp/tmpxft_0000c3b9_00000000-5_test-view.cudafe1.cpp.
make[2]: *** [test/unit/CMakeFiles/test-view.exe.dir/test-view.cpp.o] Error 1
make[1]: *** [test/unit/CMakeFiles/test-view.exe.dir/all] Error 2
make: *** [all] Error 2

The build was configured with:

$ CC=xlc CXX=xlC cmake -DENABLE_OPENMP=On -DENABLE_CUDA=On ../RAJA/

Using following modules:

Currently Loaded Modules:
  1) xl/16.1.1-7                      3) lsf-tools/2.0   5) cmake/3.17.3
  2) spectrum-mpi/10.3.1.2-20200121   4) DefApps         6) cuda/11.0.2

Any suggestion how to get past this would be most welcome.

pelesh commented 4 years ago

By the way, the same configuration but with GCC 7.4 builds fine and passes all tests.

davidbeckingsale commented 4 years ago

Do you have any newer XL versions?

pelesh commented 4 years ago

Do you have any newer XL versions?

Unfortunately, this is the only version available on my machine :(

davidbeckingsale commented 4 years ago

We can reproduce this error and will work on fixing it. Can you please try using CUDA 10.2? @rhornung67 verified that CUDA 10.2 works.

pelesh commented 4 years ago

We can reproduce this error and will work on fixing it. Can you please try using CUDA 10.2? @rhornung67 verified that CUDA 10.2 works.

The options I have are CUDA 10.1 and 11.0. I'll try with 10.1 and let you know.

rhornung67 commented 4 years ago

It works with CUDA 10.1.

pelesh commented 4 years ago

This time around (configuration as above, but with CUDA 10.1) it looks like a CMake error:

[ 12%] Linking CXX executable ../test-integral-limits.exe
/usr/bin/ld: cannot find -lcudadevrt
/usr/bin/ld: cannot find -lcudart_static
make[2]: *** [test/test-integral-limits.exe] Error 1
make[1]: *** [test/unit/CMakeFiles/test-integral-limits.exe.dir/all] Error 2
make: *** [all] Error 2

The libraries in question seem to be at the library path:

$ ls /.../lib64/ | grep cuda
libcudadevrt.a
libcudart.so
libcudart.so.10.1
libcudart.so.10.1.243
libcudart_static.a
davidbeckingsale commented 4 years ago

Can you make sure you are doing this in a fresh build directory, and send us the output of make VERBOSE=1.

pelesh commented 4 years ago

Yes, I did that in a clean directory. The sequence of instructions was (from within build dir):

$ rm -Rf ./*
$ CC=xlc CXX=xlC cmake -DENABLE_OPENMP=On -DENABLE_CUDA=On ../RAJA/
  ...
$ make VERBOSE=1
  ...
[ 12%] Linking CXX executable ../test-integral-limits.exe
cd /.../src/raja/build-xl/test/unit && /.../spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/cmake-3.17.3-ranbt2pk3wzzvd2i7j3ekexaqya3m4f2/bin/cmake -E cmake_link_script CMakeFiles/test-integral-limits.exe.dir/link.txt --verbose=1
/.../xl/16.1.1-7/xlC/16.1.1/bin/xlC   -std=c++14     -O  -qsmp=omp CMakeFiles/test-integral-limits.exe.dir/test-integral-limits.cpp.o  -o ../test-integral-limits.exe  ../../lib/libgtest_main.a ../../lib/libgtest.a ../../lib/libRAJA.a /.../cuda/10.1.243/lib64/libcudart_static.a -ldl /usr/lib64/librt.so -lcudadevrt -lcudart_static 
/usr/bin/ld: cannot find -lcudadevrt
/usr/bin/ld: cannot find -lcudart_static
make[2]: *** [test/test-integral-limits.exe] Error 1
make[2]: Leaving directory `/.../src/raja/build-xl'
make[1]: *** [test/unit/CMakeFiles/test-integral-limits.exe.dir/all] Error 2
make[1]: Leaving directory `/.../src/raja/build-xl'
make: *** [all] Error 2

Some output is truncated.

It looks like a CMake bug. libcudart_static.a is linked with full path and then with a link flag, but without library path specified. This should probably be a different issue.

davidbeckingsale commented 4 years ago

That's strange. It might just be to do with how everything is set up on your machine. Can you send the CMake output? (in email is fine if you don't want to paste it all here)