Closed reuterbal closed 1 year ago
@mhoemmen who just encountered this himself. Mark, could we please upstream your solution here as a bugfix before 3.3 release?
It is regrettable how much 2.x headers inadvertently include parts of CuTe :(
@thakkarV Clang 14 as the host compiler also manifests this issue. It turns out that GCC 8.3.0 did not like a straightforward fix. GCC 8.3.0 also seems to believe that cute::C<static_cast<uint32_t>(0)>::value_type
is not uint32_t
. I'm working on a solution that all these compilers will find acceptable.
@reuterbal Just to clarify: when you say "NVHPC 23.1," do you mean the nvc++ compiler, or GCC (g++) as the host compiler plus nvcc as the device compiler? I'm guessing the latter, because even if nvc++ is in the PATH, CUTLASS will prefer the GCC + nvcc option.
No, I am indeed using nvc++ as host compiler:
[<uid>@login05 build]$ grep CMAKE_CXX_COMPILER CMakeCache.txt
CMAKE_CXX_COMPILER:FILEPATH=/leonardo/prod/spack/03/install/0.19/linux-rhel8-icelake/gcc-8.5.0/nvhpc-23.1-x5lw6edfmfuot2ipna3wseallzl4oolm/Linux_x86_64/23.1/compilers/bin/nvc++
@reuterbal Thanks for responding! : - ) Could you please also list the following variables from your CMakeCache.txt file?
CMAKE_C_COMPILER
CMAKE_CUDA_COMPILER
CMAKE_CXX_FLAGS
CMAKE_CUDA_FLAGS
It's surprising to me that CUTLASS detected nvc++ automatically. I'm guessing that g++ and gcc are not in your PATH. I don't know what this system "LEONARDO" is, but I'm also guessing that it uses a modules system and that you have the GCC modules unloaded and the NVHPC modules loaded.
CUTLASS 3 does not currently have official support for nvc++. Do you need to use nvc++ or are you just relying on default settings to build?
I did not realize that nvc++ wasn't a (officially) supported compiler, thanks for clearing that up. You are correct, LEONARDO is a new EuroHPC system with A100 GPUs. CUTLASS is used inside a mixed Fortran/CUDA application that utilizes OpenACC, and I was using the same compiler for all dependencies. I could certainly try rebuilding with the GNU 11.3.0 available there.
[<uid>@login05 build]$ grep -E 'CMAKE_(C|CUDA)_COMPILER|CMAKE_(C|CXX|CUDA)_FLAGS' CMakeCache.txt
CMAKE_CUDA_COMPILER:FILEPATH=/leonardo/prod/spack/03/install/0.19/linux-rhel8-icelake/gcc-8.5.0/nvhpc-23.1-x5lw6edfmfuot2ipna3wseallzl4oolm/Linux_x86_64/23.1/compilers/bin/nvcc
CMAKE_CUDA_FLAGS:STRING=
CMAKE_CUDA_FLAGS_DEBUG:STRING=-g
CMAKE_CUDA_FLAGS_MINSIZEREL:STRING=-O1 -DNDEBUG
CMAKE_CUDA_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_CUDA_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG
CMAKE_CXX_FLAGS:STRING=
CMAKE_CXX_FLAGS_DEBUG:STRING=-g -O0
CMAKE_CXX_FLAGS_MINSIZEREL:STRING=-O2 -s -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE:STRING=-fast -O3 -DNDEBUG
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -gopt
CMAKE_C_COMPILER:FILEPATH=/leonardo/prod/spack/03/install/0.19/linux-rhel8-icelake/gcc-8.5.0/nvhpc-23.1-x5lw6edfmfuot2ipna3wseallzl4oolm/Linux_x86_64/23.1/compilers/bin/nvc
CMAKE_C_FLAGS:STRING=
CMAKE_C_FLAGS_DEBUG:STRING=-g -O0
CMAKE_C_FLAGS_MINSIZEREL:STRING=-O2 -s -DNDEBUG
CMAKE_C_FLAGS_RELEASE:STRING=-fast -O3 -DNDEBUG
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=-O2 -gopt
//ADVANCED property for variable: CMAKE_CUDA_COMPILER
CMAKE_CUDA_COMPILER-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CUDA_FLAGS
CMAKE_CUDA_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CUDA_FLAGS_DEBUG
CMAKE_CUDA_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CUDA_FLAGS_MINSIZEREL
CMAKE_CUDA_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CUDA_FLAGS_RELEASE
CMAKE_CUDA_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CUDA_FLAGS_RELWITHDEBINFO
CMAKE_CUDA_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS
CMAKE_CXX_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_DEBUG
CMAKE_CXX_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_MINSIZEREL
CMAKE_CXX_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_RELWITHDEBINFO
CMAKE_CXX_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_COMPILER
CMAKE_C_COMPILER-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS
CMAKE_C_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_DEBUG
CMAKE_C_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_MINSIZEREL
CMAKE_C_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
@reuterbal Thanks for sending the list of CMake options! : - ) I'll try replicating this build. It's possible that this is an "nvc++ host compiler, nvcc device compiler" situation, which differs from "using nvc++ for everything."
CUTLASS is used inside a mixed Fortran/CUDA application that utilizes OpenACC, and I was using the same compiler for all dependencies. I could certainly try rebuilding with the GNU 11.3.0 available there.
We definitely test with the GCC 11 series. On the other hand, I would need to review nvfortran's documentation to see if there would be ABI compatibility issues with calling host functions generated by g++, or device functions generated by nvcc.
@reuterbal FYI, I was able to build this test successfully with NVHPC 23.7's nvc++ as the host compiler, and nvcc as the device compiler. I didn't even need to use sudo
to install the NVHPC SDK in my home directory. I just set CMAKE_CUDA_COMPILER
to my nvcc (full path), CMAKE_CXX_COMPILER
to my nvc++, and CMAKE_C_COMPILER
to my nvc (one c! it lives in the same directory as nvc++).
I can try again with NVHPC 23.1, but if you're able to use a newer compiler that works, I would recommend doing that first. Support for compilers other than GCC + NVCC in CUTLASS 3 is still new. I've been having some trouble porting that MixedBits test specifically.
Thanks @mhoemmen, that is good to know! Unfortunately, in particular the CUDA-aware MPI library has only been built against that NVHPC release on Leonardo. I don't trust myself to be able to get an installation up and running that is tuned to the system in the same way.
But I should be able to ask for a newer compiler release to be installed, including the relevant dependencies.
@reuterbal Thanks for reporting back! I think the NVHPC SDK may have an MPI in it that is (or could be) CUDA-aware, but I totally understand why you would want to use the existing known-to-work CUDA-aware MPI.
@reuterbal I have to issue a correction: I'm able to replicate this build error for both NVHPC 23.1 and 23.7. (I had the wrong local branch checked out!) Thankfully, this means that my local changes actually fix the 23.7 build. I'm testing 23.1 now.
Edit: my local changes also fix the 23.1 build. I'm working on getting them through other compilers....
@reuterbal Successfully got the fixes and new tests through the test gauntlet. Awaiting permission to merge.
FYI, the fix should go into the 3.2.1 release, if not the 3.2 release. Thanks for reporting! : - )
Thanks so much, I'll make sure to test the new release.
Coming to 3.2.1 this week
Describe the bug Building cutlass 3.2 (current main) with NVHPC 23.1 / CUDA 11.8 for A100 fails with a static assertion in the tests:
Steps/Code to reproduce bug
Expected behavior Build succeeds
Environment details (please complete the following information): LEONARDO, NVHPC 23.1 and CUDA 11.8 system-provided installations
Additional context Problem does not seem to occur using release 3.1.0 (build not yet completed)