Closed BenjaminW3 closed 2 years ago
How would one take care of that syntax in alpaka?
It is hard to unify the syntax requirements for OpenMP 4 and CUDA. A possible accelerator function with template that could be called on a CUDA device, an OpenMP 4 device and on the host would look like:
#pragma omp declare target
template<
typename TAcc>
__host__ __device__ auto doSomething(
TAcc const & acc)
-> void
{
// ...
}
#pragma omp end declare target
So there are 3 places where a macro would be required. It is even questionable if both OpenMP 4 and CUDA can be used together (this is a question of compiler support).
talking about intel, do we have plans for a TBB backend (open an independent issue?)
At least not in the near future ;-) Please open a seperate issue for it, because TBB is compiler independent.
I think in the near future offloading will still be required, so that this is at least annoying. As for TBB - does it provide the performance on Phi?
Afaik, TBB is the recommendation from Intel when parallelizing with C++ template meta programming. They even recommend it over OpenMP (for C++ projects).
moved to #65
For now, the OpenMP 4 back-end is disabled by default and has to be enabled explicitly to allow compilation with ICC.
@jkelling Do you know whether this is still a problem?
@jkelling Do you know whether this is still a problem?
No idea if the new OpenMP 5 backend works with ICC. I never tested that.
However: The initial issue would be a compiler bug at least since OpenMP 4.5 as these functions are inline. Separable compilation is not supported by the new backend, which kinda addresses a potential related issue as "resolved---won't fix".
It appears we would need to set the appropriate compiler flags for ICPC: https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support/adding-openmp-support-to-your-application.html
But it mentions that it has (partial) support for OpenMP 5.
It appears we would need to set the appropriate compiler flags [...]
If it is only about compiler flags, then this is technically not an issue: Alpaka's cmake configuration does ATM not set the compiler flags for OpenMP5/OpenACC for any compiler completely (it does add -fopenmp
for GCC/Clang, which should also be part of with Intel). README_OMP5.md has some information about the required flags for GCC, Clang and some others. Currently, these need to be added manually by the user. This also applies to Intel, i.e. if this is regarded as an issue, the fix would be to add the Intel flags to that file.
Adding complete cmake support for offloading, which entails an additional option to select the target, would be a different issue. Is there any work being done on adding support for this directly to cmake?
Is there any work being done on adding support for this directly to cmake?
I'm not aware of any efforts in this direction. This would likely also depend on the compiler because I imagine that each compiler will support offloading for different hardware targets.
I played around a bit with the current ICPC:
$ icpc --version
icpc (ICC) 2021.2.0 20210228
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
The relevant compiler flags are -qopenmp -qopenmp-offload
(or -qopenmp-offload=host
for disabling the offload). Leaving out -qopenmp-offload
will make ICPC complain about unknown OpenMP pragmas (everything related to #pragma omp target
).
However, even with -qopenmp-offload
it refuses to compile/link the current alpaka HEAD
. Configuration:
$ cmake -DOpenMP_CXX_VERSION=5 \
-DALPAKA_ACC_ANY_BT_OMP5_ENABLE=ON \
-DBUILD_TESTING=ON \
-Dalpaka_BUILD_EXAMPLES=ON \
-DCMAKE_CXX_FLAGS="-shared-intel -qopenmp -qopenmp-offload" \
-DCMAKE_CXX_COMPILER=icpc ..
-- The CXX compiler identification is Intel 2021.2.0.20210228
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/icpc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found Boost: /usr/lib64/cmake/Boost-1.76.0/BoostConfig.cmake (found suitable version "1.76.0", minimum required is "1.65.1") found components: fiber
-- Found OpenMP_CXX: -qopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- ALPAKA_ACC_ANY_BT_OMP5_ENABLED
-- The C compiler identification is GNU 11.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Catch2: Using INTERNAL version 2.13.3
-- ALPAKA_DEV_COMPILE_OPTIONS: -Wall
-- Configuring done
-- Generating done
-- Build files have been written to: /home/stepha27/workspace/caravan/test/alpaka/build
Build:
$ cmake --build . --target vectorAdd
Building CXX object example/vectorAdd/CMakeFiles/vectorAdd.dir/src/vectorAdd.cpp.o
/home/stepha27/workspace/caravan/test/alpaka/include/alpaka/kernel/TaskKernelOmp5.hpp(185): warning #2570: function has not been declared with compatible "target" attribute
{ kernelFnObj(acc, args...); },
^
detected during:
instantiation of "auto alpaka::TaskKernelOmp5<TDim, TIdx, TKernelFnObj, TArgs...>::operator()(const alpaka::DevOmp5 &) const->void [with TDim=alpaka::DimInt<1UL>, TIdx=std::size_t={unsigned long}, TKernelFnObj=VectorAddKernel, TArgs=<alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned
int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>,
std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, const std::size_t={unsigned long} &>]" at line 251
instantiation of "auto alpaka::traits::Enqueue<alpaka::QueueOmp5Blocking, alpaka::TaskKernelOmp5<TDim, TIdx, TKernelFnObj, TArgs...>, void>::enqueue(alpaka::QueueOmp5Blocking &, const alpaka::TaskKernelOmp5<TDim, TIdx, TKernelFnObj, TArgs...> &)->void [with TDim=alpaka::DimInt<1UL>, TIdx=std::size_t={unsigned long}, TKernelFnObj=VectorAddKernel, TArgs=<alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned
int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *,
alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, const std::size_t={unsigned long} &>]" at line 48 of "/home/stepha27/workspace/caravan/test/alpaka/include/alpaka/queue/Traits.hpp"
instantiation of "auto alpaka::enqueue(TQueue &, TTask &&)->void [with TQueue=alpaka::Queue<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, alpaka::property::Blocking>, TTask=const alpaka::TaskKernelOmp5<alpaka::DimInt<1UL>, std::size_t={unsigned long}, VectorAddKernel, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned
long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>,
uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, const std::size_t={unsigned long} &> &]" at line 177 of "/home/stepha27/workspace/caravan/test/alpaka/example/vectorAdd/src/vectorAdd.cpp"
/home/stepha27/workspace/caravan/test/alpaka/include/alpaka/kernel/TaskKernelOmp5.hpp(185): warning #2570: *MIC* function has not been declared with compatible "target" attribute
{ kernelFnObj(acc, args...); },
^
detected during:
instantiation of "auto alpaka::TaskKernelOmp5<TDim, TIdx, TKernelFnObj, TArgs...>::operator()(const alpaka::DevOmp5 &) const->void [with TDim=alpaka::DimInt<1UL>, TIdx=std::size_t={unsigned long}, TKernelFnObj=VectorAddKernel, TArgs=<alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned
int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>,
std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, const std::size_t={unsigned long} &>]" at line 251
instantiation of "auto alpaka::traits::Enqueue<alpaka::QueueOmp5Blocking, alpaka::TaskKernelOmp5<TDim, TIdx, TKernelFnObj, TArgs...>, void>::enqueue(alpaka::QueueOmp5Blocking &, const alpaka::TaskKernelOmp5<TDim, TIdx, TKernelFnObj, TArgs...> &)->void [with TDim=alpaka::DimInt<1UL>, TIdx=std::size_t={unsigned long}, TKernelFnObj=VectorAddKernel, TArgs=<alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned
int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *,
alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, const std::size_t={unsigned long} &>]" at line 48 of "/home/stepha27/workspace/caravan/test/alpaka/include/alpaka/queue/Traits.hpp"
instantiation of "auto alpaka::enqueue(TQueue &, TTask &&)->void [with TQueue=alpaka::Queue<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, alpaka::property::Blocking>, TTask=const alpaka::TaskKernelOmp5<alpaka::DimInt<1UL>, std::size_t={unsigned long}, VectorAddKernel, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned
long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>, uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, alpaka::Elem<alpaka::Buf<alpaka::ExampleDefaultAcc<alpaka::DimInt<1UL>, std::size_t={unsigned long}>,
uint32_t={__uint32_t={unsigned int}}, alpaka::DimInt<1UL>, std::size_t={unsigned long}>>={std::remove_volatile_t<uint32_t={__uint32_t={unsigned int}}>={unsigned int}} *, const std::size_t={unsigned long} &> &]" at line 177 of "/home/stepha27/workspace/caravan/test/alpaka/example/vectorAdd/src/vectorAdd.cpp"
Linking CXX executable vectorAdd
ld: cannot find /opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/../../compiler/lib/intel64_lin/i_ofldbegin_target.o: No such file or directory
make[3]: *** [example/vectorAdd/CMakeFiles/vectorAdd.dir/build.make:98: example/vectorAdd/vectorAdd] Error 1
make[2]: *** [CMakeFiles/Makefile2:1884: example/vectorAdd/CMakeFiles/vectorAdd.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1891: example/vectorAdd/CMakeFiles/vectorAdd.dir/rule] Error 2
make: *** [Makefile:660: vectorAdd] Error 2
Adding __attribute__((target("avx2")))
to the kernel does not change the output. I guess there is a missing pragma somewhere or something...
Thanks for testing.
Is that the new one, based on clang? (I suggest to ignore the old one)
The warnings can be ignored: The standard states, that the declare target
is optional for functions defined in the same compilation unit as the target
code calling them.
The wrror:
ld: cannot find /opt/intel/oneapi/compiler/2021.2.0/linux/bin/intel64/../../compiler/lib/intel64_lin/i_ofldbegin_target.o: No such file or directory
Does not look like something that could be caused by anything in the code. it looks more like a broken installation.
Is that the new one, based on clang? (I suggest to ignore the old one)
Nope, that is the "classic" Intel compiler. The other one is called dpcpp
or icpx
(seem to be synonyms; haven't tested that one yet).
Does not look like something that could be caused by anything in the code. it looks more like a broken installation.
From what I could gather through a quick search this seems to be related to MIC (or the lack thereof on my system). My assumption is that the compiler tries to offload to a MIC target (see the warnings) and fails to find the precompiled file needed to do so.
Using dpcpp
/ icpx
works out of the box.
Configuration:
$ cmake -DOpenMP_CXX_VERSION=5 \
-DALPAKA_ACC_ANY_BT_OMP5_ENABLE=ON \
-DBUILD_TESTING=ON \
-Dalpaka_BUILD_EXAMPLES=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_C_FLAGS="-shared-intel -fiopenmp -fopenmp-targets=spir64" \
-DCMAKE_CXX_COMPILER=icpx \
-DCMAKE_CXX_FLAGS="-shared-intel -fiopenmp -fopenmp-targets=spir64" \
-DCMAKE_BUILD_TYPE=Release \
..
-- The CXX compiler identification is IntelLLVM 2021.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/oneapi/compiler/2021.2.0/linux/bin/icpx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Boost toolset is unknown (compiler IntelLLVM 2021.2.0)
-- Boost toolset is unknown (compiler IntelLLVM 2021.2.0)
-- Boost toolset is unknown (compiler IntelLLVM 2021.2.0)
-- Found Boost: /usr/lib64/cmake/Boost-1.76.0/BoostConfig.cmake (found suitable version "1.76.0", minimum required is "1.65.1") found components: fiber
-- Found OpenMP_CXX: -fiopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- ALPAKA_ACC_ANY_BT_OMP5_ENABLED
-- The C compiler identification is IntelLLVM 2021.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/intel/oneapi/compiler/2021.2.0/linux/bin/icx - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Catch2: Using INTERNAL version 2.13.3
-- ALPAKA_DEV_COMPILE_OPTIONS:
-- Configuring done
-- Generating done
-- Build files have been written to: /home/stepha27/workspace/caravan/test/alpaka/build
Build (the "unused argument" is required for linking, not compiling):
$ cmake --build . --target vectorAdd
Building CXX object example/vectorAdd/CMakeFiles/vectorAdd.dir/src/vectorAdd.cpp.o
clang++: warning: argument unused during compilation: '-shared-intel' [-Wunused-command-line-argument]
Linking CXX executable vectorAdd
Built target vectorAdd
Execution:
$ ./vectorAdd
Using alpaka accelerator: AccOmp5<1,m>
Time for kernel execution: 0.837207s
Time for HtoD copy: 3.9639e-05s
Execution results correct!
:tada:
The
ExecCpuOmp4
/AccCpuOmp4
does not compile with ICC because:... called in "omp target" region must appear in prior "omp declare target" pragma
. This affects at leastcore::mapIdx
,block::sync::syncBlockThreads
and all the accelerator functions that can be called within a kernel. GCC compiles this correctly and does not even issue a warning (vioalting the standard). This is reasonable because it currently only supports CPU targets which does not require to build for additional back-ends.The problem is, that the
omp declare target
syntax is so verbose that it can not be represented by a simple annotation in front of the functions but requires#pragma omp declare target
and#pragma omp end declare target
after the functions.