BlueBrain / HighFive

HighFive - Header-only C++ HDF5 interface
https://bluebrain.github.io/HighFive/
Boost Software License 1.0
673 stars 159 forks source link

CUDA "specialization of ‘HighFive::SliceTraits<HighFive::Selection>’ after instantiation" #180

Closed hschwane closed 1 year ago

hschwane commented 5 years ago

Hey, I want to use HighFive in a CUDA project, however when i include <highfive/H5DataSet.hpp> into a cud file i get the error

HighFive/include/highfive/bits/../H5Selection.hpp:56:40: error: specialization of ‘HighFive::SliceTraits<HighFive::Selection>’ after instantiation

I can't figure out where it is specialized and where it is initialized. Any ideas?

Edit: The issue seems to be with

    template <typename T>
    friend class ::HighFive::SliceTraits;
    // absolute namespace naming due to GCC bug 52625

in file H5Selection.hpp line 57++. The cuda compiler removes the absolute namespace before passing the code to gcc...

zhihaoy commented 4 years ago

Which nvcc and GCC version you are using?

keichi commented 4 years ago

I just bumped into the same problem here. I'm using nvcc 10.2.89 and gcc 8.3.1. Is there any workaround?

keichi commented 4 years ago

It does compile if I use nvcc and clang 7.0.1 instead.

hschwane commented 4 years ago

Sorry for not posting. I am using CUDA9 with gcc 6.4. The workaround i used was to remove the friend declaration in file H5Selection.hpp and make the members of HighFive::SliceTraits; public.

keichi commented 4 years ago

This has been fixed with CUDA 11.

jmont-dev commented 3 years ago

FYI, I'm still observing this error when building against CUDA 11.1. We observe the same compilation error with the H5Selection and H5Group files as was previously reported. The error is resolved by simply using the patch in e0d8868 on the fix-180 branch.

We're using Red Hat Enterprise Linux 7.9, CUDA toolkit 11.1.74, and gcc 6.3.1.

For now running with the overlaid patch is an acceptable solution but I'm curious if others are still running into the problem.

keichi commented 3 years ago

@jmont-dev You are right, I was mistakenly testing my branch. It still doesn't compile with CUDA 11.1 and GCC 9.3.0. I hope this can be fixed. I can submit a PR if needed.

ianhinder commented 3 years ago

The patch mentioned earlier, from https://github.com/BlueBrain/HighFive/commit/e0d88681b489a9c07386d15deb2c5d30cd3a477b, doesn't apply any more, probably due to a change to an adjacent line, and also doesn't seem to exist in any repository according to github. I implemented the same changes in https://github.com/UoMResearchIT/HighFive/commit/050d8b971e15f29f7751de59d3f50eebab59c372, which is based off v2.3, which is the latest release.

keichi commented 2 years ago

Seems like this issue is finally resolved in v2.4 (with CUDA 11.6 and GCC 9).

pramodk commented 2 years ago

I was skimming through the discussion and mentioned PR but couldn't find instructions/example to reproduce the original issue.

If someone could mention how this should be tested/reproduced, someone from dev team would be happy to take a look.

aminiussi commented 1 year ago

Hi, I am having the same issue with g++10.1, cuda 11.7.1 and HighFive 2.6.2

the nvcc compiler confuses the template friend declaration with a specialization.

I'll tr to get a small example

aminiussi commented 1 year ago

@pramodk here is the smaller test I could get:

[roth005@jean-zay3 gpu]$ more znort.cpp
#include <highfive/H5File.hpp>
[roth005@jean-zay3 gpu]$ which g++
/gpfslocalsup/spack_soft/gcc/10.1.0/gcc-8.4.1-p7jnnskeffunl7o5fpz4zmnqch443jhm/bin/g++
[roth005@jean-zay3 gpu]$ g++ --version
g++ (Spack GCC) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[roth005 gpu]$ /gpfslocalsys/cuda/11.7.1/bin/nvcc -expt-extended-lambda -Wext-lambda-captures-this -arch=sm_70 -ccbin g++   -I/gpfswork/rech/oth/roth005/fargo/highfive/external/HighFive/include  -std=c++17 -Xcompiler -fopenmp,-pthread -x cu ./znort.cpp -c 
/gpfswork/rech/oth/roth005/fargo/highfive/external/HighFive/include/highfive/bits/../H5Group.hpp:49:40: error: specialization of 'HighFive::NodeTraits<HighFive::Group>' after instantiation
   49 |     template <typename Derivate>
      |                                        ^         
/gpfswork/rech/oth/roth005/fargo/highfive/external/HighFive/include/highfive/bits/../H5Selection.hpp:56:40: error: specialization of 'HighFive::SliceTraits<HighFive::Selection>' after instantiation
   56 |     template <typename Derivate>
      |                                        ^          
[roth005 gpu]$ (cd /gpfswork/rech/oth/roth005/fargo/highfive/external/HighFive/; git status)
HEAD detached at 89a235d
nothing to commit, working tree clean
[roth005 gpu]$
aminiussi commented 1 year ago

Hi, any news regarding this issue ?

1uc commented 1 year ago

Thank you for providing a reproducer. We've tried all combinations of cuda/11.0.2, cuda/11.5.1, cuda/11.6.0 with gcc/9.4.0, gcc/11.2.0. Additionally, we tried the NVHPC compiler nvc++ with versions 21.11, 22.2 and 22.3. Finally, we've tried with gcc 10.3.0 the closest I have to gcc 10.1.0 which unfortunately I can't build through spack). All of these combinations work. Hence either the reproducer didn't capture the issue; or gcc 10.1.0 has a compiler bug that got fixed in 10.3.0 (or earlier).

Unfortunately, the patch isn't something one would want to merge over a compiler bug that has been fixed some time ago.

Since multiple versions of GCC are compatible with reasonably old versions of NVCC (11.0.2 was released in July 2020), it shouldn't be too hard to use HighFive with CUDA. If this is indeed simply a compiler bug in GCC 10.1.0, I don't see any ambitions to create a work around.

Could you please try one of the GCC version we've tested?

aminiussi commented 1 year ago

I can try to give it a shot, but I will need another solution: the problem appears in an HPC context where neither me or our users have a lot of freedom regarding the version of the tools we use.

As an example, a colleague just got the issue with gcc 9.2.0 and cuda/11.3 on a cluster for which he has no control over the software development tools. On one of our main cluster, I have to deal with cuda/11.2 (the most recent available there, other versions are 10.x) and gcc8.3 (can move without breaking other dependencies, which include a cuda aware MPI).

But I understand the situation. I guess for sometime I will use a fork with a patched branch that I will synchronize untill our target platforms gets up to date.

Thanks!

1uc commented 1 year ago

On the cluster with other versions of GCC 10.x, what is the latest version? I can't reproduce the issue with 10.3.0.

If one of the versions is 10.x with x >= 3 could you please confirm that your reproducer does/doesn't reproduce the issue? Our reluctance to fix the build issue really only applies to the case where it's a genuine compiler bug that has been fixed for a while now.

aminiussi commented 1 year ago

On the machine with gcc 10.1.0, it's the only 10.x compiler:

 $ module avail gcc
------------------------ <module path> -------------------------
gcc/4.9.4  gcc/6.5.0  gcc/8.2.0  gcc/8.4.1(8.3.1)  gcc/9.1.0-cuda-openacc  gcc/10.1.0               gcc/12.2.0  
gcc/5.5.0  gcc/7.3.0  gcc/8.3.0  gcc/9.1.0         gcc/9.3.0               gcc/10.1.0-cuda-openacc 

And the 12.2.0 is not compatible with cuda/11.(2|7) (it does not reproduce the problem when forced though)

1uc commented 1 year ago

Unfortunately, we have zero overlap in compilers which makes it hard to develop a fix. While I can't rule out that someone else wont provide a proper fix for this issue and certainly don't want to prevent anyone from doing so, I'd suggest going the route of forking or patching.

aminiussi commented 1 year ago

Ok. So far I haven't found any decent way to fix that problem. One could use non template bases classes to provide access to the problematic method (as they can't be mistaken for specializations) but that's an ugly workaround for a compiler bug.

Right now I'm using a local fork and will remove it when our target clusters gets fixed.

1uc commented 1 year ago

One could use non template bases classes to provide access to the problematic method [...]

In the same spirit, one could befriend a function, e.g. make_*. Good suggestion, thanks! Which is what we did in #688. Hopefully, that helps with the compiler issues.