alpaka-group / cupla

C++ User interface for the Platform independent Library Alpaka :arrows_clockwise:
Other
37 stars 18 forks source link

Problems with cuda_fp16 and Eigen Library #131

Closed waredjeb closed 5 years ago

waredjeb commented 5 years ago

Hello, I was porting a code from cuda using cupla, and I noticed that some errors are generated when <cupla/config/GpuCudaRt.hpp> is included before the #includes of the Eigen library. In the detail: Errors are generated when I have a structure like this:

#include <cuda/config/GpuCudaRt.hpp>
#include <Eigen/Core>
#include <Eigen/Eigenvalues>

While in the following way, I get no errors:

#include <Eigen/Core>
#include <Eigen/Eigenvalues>
#include <cuda/config/GpuCudaRt.hpp>

In particular the errors are the following:

/usr/local/cuda/include/cuda_fp16.h(3002): error: incomplete type is not allowed
/usr/local/cuda/include/cuda_fp16.h(3002): error: expected an identifier
/usr/local/cuda/include/cuda_fp16.h(3002): error: inline specifier allowed on function declarations only
/usr/local/cuda/include/cuda_fp16.h(3002): error: expected a ";"
/usr/local/cuda/include/cuda_fp16.h(3008): error: incomplete type is not allowed
/usr/local/cuda/include/cuda_fp16.h(3008): error: expected an identifier
/usr/local/cuda/include/cuda_fp16.h(3008): error: inline specifier allowed on function declarations only
/usr/local/cuda/include/cuda_fp16.h(3008): error: expected a ";"
/usr/local/cuda/include/cuda_fp16.hpp(2018): error: name followed by "::" must be a class or namespace name
/usr/local/cuda/include/cuda_fp16.hpp(2018): error: expected an identifier
/usr/local/cuda/include/cuda_fp16.hpp(2018): error: inline specifier allowed on function declarations only
/usr/local/cuda/include/cuda_fp16.hpp(2018): error: expected a ";"

Is there a way to solve this problem, where for some reasons I would need to include the GpuCudaRt.hpp before the Eigen library?

I'm on CentOS7 with cuda-10.1 and gcc 8.3.1 In the following the flags for the compilation:


export CXXFLAGS="-m64 -std=c++14 -g -O2 -DALPAKA_DEBUG=0 -DALPAKA_CUDA_ARCH=60:70:75 -I$CUDA_ROOT/include -L$CUDA_ROOT/lib64 -lcudart -lcuda -I$ALPAKA_ROOT/include -I$CUPLA_ROOT/include -I$SLC7_BASE/include -I$EIGEN_BASE -I$TBB_BASE/include -L$TBB_BASE/lib -ltbb" 
HOST_FLAGS="-fopenmp -pthread -fPIC -ftemplate-depth-512 -Wall -Wextra -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-local-typedefs -Wno-attributes -Wno-reorder -Wno-sign-compare"

NVCC="$CUDA_ROOT/bin/nvcc"
NVCC_FLAGS="-ccbin $CXX -w -lineinfo --expt-extended-lambda --expt-relaxed-constexpr --use_fast_math --ftz=false --cudart shared"```
sbastrakov commented 5 years ago

Thanks for a detailed description. I will try to reproduce tomorrow. Our general idea is to include cupla before the CUDA stuff, which is however the way that does not work for you. The issues are probably caused by Eigen including CUDA headers directly and so the translation unit has both cupla and CUDA names in. But it is unclear why the other order works then.

sbastrakov commented 5 years ago

Needs to be checked, but we probably need to add defines for CUDA __half and __half2 types. Kinda like what #118 did for int3 and float3.

psychocoderHPC commented 5 years ago

@waredjeb Is the Eigen lib also using cuda internally? Cupla ships internally a few defines to rename functions cuda* into cupla*. This has the side effect that you must include cupla always after all external libs which need direct access to cuda. @sbastrakov thats the reason why include cupla late is working. I like to remove this and allow the user to chose the renaming defines only if he need it.

sbastrakov commented 5 years ago

Yes, Eigen includes CUDA stuff. Then we need to update our readme, that explicitly tells to include cupla before CUDA headers.

waredjeb commented 5 years ago

@psychocoderHPC yes Eigen includes CUDA. Then, we tried to manage this problem. Firstly we added the definition of the headers required:

#define __CUDA_FP16_H__
#define __CUDA_FP16_HPP__

In this way we moved the problem on the Half.h header, in the following some of the errors generated:

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(93): error: identifier "__half_raw" is undefined

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(94): error: identifier "__half_raw" is undefined

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(95): error: identifier "__half_raw" is undefined

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(97): error: not a class or struct name

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(100): error: identifier "__half_raw" is undefined

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(111): error: identifier "__half" is undefined

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(111): error: invalid redeclaration of member function "Eigen::half_impl::half_base::half_base(const <error-type> &)"
(100): here

We tried, adding:

typdef __half __half_raw;

And we got errors with hceil and hfloor in Half.h:

...
/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(575): error: identifier "hfloor" is undefined

/data/cmssw/slc7_amd64_gcc820/external/eigen/e4c107b451c52c9ab2d7b7fa4194ee35332916ec-nmpfii/include/eigen3/Eigen/src/Core/arch/GPU/Half.h(583): error: identifier "hceil" is undefined
...

Finally, we managed to compile modifying Eigen internally.In particular we touched the following part of Half.h.

...
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) {
#if (EIGEN_CUDACC_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
  defined(EIGEN_HIP_DEVICE_COMPILE)
  return half(hfloor(a));
#else
  return half(::floorf(float(a)));
#endif
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) {
#if (EIGEN_CUDACC_VER >= 80000 && defined EIGEN_CUDA_ARCH && EIGEN_CUDA_ARCH >= 300) || \
  defined(EIGEN_HIP_DEVICE_COMPILE)
  return half(hceil(a));
#else
  return half(::ceilf(float(a)));
#endif
}

To finally have that:

...
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half floor(const half& a) {
  return half(::floorf(float(a)));
}
EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC half ceil(const half& a) {
  return half(::ceilf(float(a)));
}

In such a way everything is working fine for our case!

sbastrakov commented 5 years ago

Glad that you found a workaround.

Does your application actually make use of half-precision types? Otherwise, another possible solution is to not define EIGEN_HAS_CUDA_FP16 and the corresponding CUDA headers should not be included. Unfortunately, this also requires to modify Eigen.

waredjeb commented 5 years ago

@sbastrakov @psychocoderHPC thanks for the quick replies!

sbastrakov commented 5 years ago

@waredjeb should this issue be closed? Not sure if you are interested in trying out this suggestion.

waredjeb commented 5 years ago

@sbastrakov Sorry I forgot to reply to the question, I will try your suggestion!

sbastrakov commented 5 years ago

After offline discussion with @psychocoderHPC , our documentation recommends the wrong include order. I will provide a PR which fixes it and further explains in terms of other includes pulling CUDA headers internally.

sbastrakov commented 5 years ago

@waredjeb should I close this issue? Not sure if you found time to try my later suggestion.

waredjeb commented 5 years ago

@sbastrakov sorry for the late answer, I tried your suggestion. But in this way I got some errors: .../Eigen/src/Core/arch/GPU/Half.h(583): error: identifier "hceil" is undefined .../Eigen/src/Core/arch/GPU/Half.h(575): error: identifier "hfloor" is undefined

sbastrakov commented 5 years ago

Thanks for trying, sorry it did not work out. Closing the issue.