NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.59k stars 950 forks source link

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

Open levicki opened 2 months ago

levicki commented 2 months ago

Describe the bug I initially reported this issue to xformers since xformers build was failing for me without realizing error was in CUTLASS submodule. After some back and forth and more testing on my end I realized the issue seems to be with CUTLASS 3.5.0.

Steps/Code to reproduce bug

  1. Install Visual Studio 2022 17.11.0 with C++ Desktop Development workload
  2. Install CUDA toolkit 12.6
  3. git clone https://github.com/NVIDIA/cutlass
  4. cd cutlass
  5. git checkout v3.5.0
  6. cmake-gui
  7. Select VS 2022
  8. Select x64
  9. Leave native compiler
  10. Click Configure
  11. Click Generate
  12. Click Open project
  13. Select Release
  14. Click Build

Expected behavior Build should succeed, it is failing on this (please ignore C:/BUILD/xformers prefix -- the same compilation errors happen from within Visual Studio build):

C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): warning C4346: 'SharedStorage': dependent name is not a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: prefix the qualified-id with 'typename' to indicate a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: the template instantiation context (the oldest one first) is
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(60): note: while compiling class template partial specialization 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(124): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(133): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage::PipelineStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): error C2061: syntax error: identifier 'SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C3646: 'math_wg_order': unknown override specifier
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int

Note that there might be other build errors as well, this was just the first place where building a project failed. It seems as if there might be some compiler issue with latest Visual Studio update?

Environment details (please complete the following information):

Additional context cl.exe Version 19.41.34120 for x64

levicki commented 2 months ago

Your setup.py is using -std=c++17 for CXX options — MSVC syntax is -std:c++17 or /std:c++17, using GNU syntax leads to a warning about unrecognized compiler option (and probably compilation without C++17 support). Also, -O3 doesn't exist for MSVC.

levicki commented 2 months ago

The culprit is CUDA 12.6 — I can build with CUDA 12.4.1 just fine.

levicki commented 2 months ago

NVIDIA bug ID #4820029.

thakkarV commented 2 months ago

tracking. Does 3.5.1 also fail with the same issue?

levicki commented 2 months ago

tracking. Does 3.5.1 also fail with the same issue?

@thakkarV I don't see a tag for 3.5.1 and it's not in releases yet?

thakkarV commented 2 months ago

Main is 3.5.1. We will tag soon

levicki commented 2 months ago

Main is 3.5.1. We will tag soon

@thakkarV Hopefully not before this issue is root-caused and at least worked around?

thakkarV commented 2 months ago

It appears to be a CUDA toolkit issue. If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

levicki commented 2 months ago

If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

If you mean with CUDA 12.6, can you give repro steps for some minimal build that triggers it so I don't have to run the full build?

Even better if you can isolate just relevant code part which causes compiler errors so I can try to build just that from the developer command prompt.

EDIT: If I remember correctly I tried with main as well, didn't make any difference.

egortech commented 1 month ago

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

levicki commented 4 weeks ago

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

I asked on the ticket, no response yet from engineering team.