Closed mrakgr closed 4 weeks ago
As an aside, I couldn't even get a Cuda hello world project to run with CMake, so there is no way I could run the tests for Cutlass in the quickstarts. Is the library supposed to be Linux only?
Forgot to mention, but I did set the C++ standard to 20.
Found this via a web search, I'll try following the instructions in it.
No sorry, there is nothing relevant there to my issue. I can't even run CMake on a Cuda project to begin with.
By the way, we're making a ~video on using Cutlass on Youtube~. You can see some of our work here. A few months ago, I anticipated these kinds of issues and didn't want to deal with the pain of getting a C++ project to run, so I started this Cuda matrix multiplication playlist instead. It was a good exercise that allowed me to implement the tensor datatype in Spiral, and get me familiar with GPU programming, even so, I admit it would have made more sense to use Cutlass to begin with since it took 2 months to make that matrix multiply.
So now that we've actually started work on an ML library and have a need for a wider array of functionality than the single Ampere matrix multiply that we've implemented provides, we are looking into Cutlass again. If we could get it to run, we'll integrate it into the library and have Cutlass be responsible for the matrix multiplication heavy lifting.
We'll be covering that in the video, of course that means, we'll also be covering how to use Cutlass in a Cuda kernel directly. Hopefully, others will find that interesting.
I am trying to go through the intro examples and am stuck on the second one. I am sure that the two include paths are correct, but when I try to build the project there are a ton of errors.
The first link is a 404 for me. The second link points to a read me instead of an example. It is likely that the code snippets in our read me could have drifted out of sync with the repo. I would highly recommend building the actual examples under the cutlass/examples/ directory instead. They are guarded by CI tests.
The readme you found is the documentation we have for Windows builds. It is best to use the latest MSVC you can find. I am curious if you are able to build any of the examples in the examples directory with our cmake
Ah, it seems the repo has been made private by accident. Sorry, I didn't expect the IDE to do that. Now it is public.
I'll try building the examples in the directory you pointed me to and get back to you.
mkdir build && cd build
cmake .. -DCUTLASS_NVCC_ARCHS=89
Surprisingly, running these two commands does in fact work, generates all the build scripts in the build directory. I expected I'd run into the same issue as when running the Cuda hello world project.
But all the examples in the quickstart readme use the Linux make
utility, while I am on Windows, so I am not sure how to run them. This is my first time trying to use CMake. I'll try building the individual examples next and seeing where that gets me.
Linux make utility, while I am on Windows, so I am not sure how to run them.
Make runs downstream of cmake. the quickstart is written with a linux system as an example, but the source of truth in generating any build system configuration is using cmake. we do not support users customizing flags to their build system / compiler outside of those generated by our build system
cmake --build . --parallel
It seems that running this command does build them, which is something. Right now, it's in the process of doing it.
But whether cmake
works is beside the point for me, as what is really important that I understand how to use the library. When I copy the code from the from the one example in 00_basic_gemm
into my own project the first issue was that the #include "helper.h"
was not found. That file is in examples/common
so I added it to the include path. But now it is complaining that threadIdx.x
is not found. And there is a bevy of errors similar to the ones in the first post.
Severity Code Description Project File Line Suppression State Details
Error (active) E0020 identifier "threadIdx" is undefined Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 161
Error (active) E0020 identifier "blockIdx" is undefined Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 161
Error (active) E0020 identifier "blockDim" is undefined Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 161
Error (active) E0029 expected an expression Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 185
Error (active) E0020 identifier "threadIdx" is undefined Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 244
Error (active) E0020 identifier "blockIdx" is undefined Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 244
Error (active) E0020 identifier "blockDim" is undefined Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 244
Error (active) E0029 expected an expression Cutlass Tests C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu 278
Error namespace "std" has no member "invoke_result_t" Cutlass Tests G:\cutlass-3.5.0\include\cute\util\type_traits.hpp 126
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 41
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 80
Error constant "v" is not used in or cannot be deduced from the template argument list of class template "cute::is_integral<cute::C<<error>>>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 80
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 95
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 97
Error constant "n" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, const T>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 97
Error the template argument list of the partial specialization includes a nontype argument whose type depends on a template parameter Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 98
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 99
Error constant "n" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, const T &>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 99
Error the template argument list of the partial specialization includes a nontype argument whose type depends on a template parameter Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 100
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 101
Error constant "n" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, T &>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 101
Error the template argument list of the partial specialization includes a nontype argument whose type depends on a template parameter Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 102
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 103
Error constant "n" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, T &&>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 103
Error the template argument list of the partial specialization includes a nontype argument whose type depends on a template parameter Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 104
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 105
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 105
Error constant "n" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, cute::C<<error>>>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 105
Error constant "v" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, cute::C<<error>>>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 105
Error the template argument list of the partial specialization includes a nontype argument whose type depends on a template parameter Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 106
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 107
Error constant "n" is not used in or cannot be deduced from the template argument list of class template "cute::is_constant<<error>, cute::integral_constant<T, v>>" Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 107
Error the template argument list of the partial specialization includes a nontype argument whose type depends on a template parameter Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 108
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 188
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 189
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 190
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 191
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 192
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 194
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 194
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 195
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 195
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 196
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 196
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 197
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 197
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 198
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 198
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 199
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 199
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 200
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 200
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 201
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 201
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 202
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 202
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 203
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 203
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 205
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 205
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 206
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 206
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 208
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 208
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 209
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 209
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 210
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 210
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 211
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 211
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 212
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 212
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 213
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 213
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 223
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 231
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 239
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 247
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 255
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 263
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 271
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 279
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 287
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 295
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 303
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 340
Error no instance of function template "cute::abs" matches the argument list Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 340
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 341
Error no instance of function template "cute::signum" matches the argument list Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 341
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 342
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 344
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 344
Error no instance of function template "cute::max" matches the argument list Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 344
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 344
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 344
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 345
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 345
Error no instance of function template "cute::min" matches the argument list Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 345
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 345
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 345
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 346
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 346
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 346
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 346
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 347
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 347
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 347
Error "auto" is not allowed here Cutlass Tests G:\cutlass-3.5.0\include\cute\numeric\integral_constant.hpp 347
Error MSB3721 The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin\nvcc.exe" -gencode=arch=compute_89,code=\"sm_89,compute_89\" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/Fdx64\Debug\vc143.pdb" -o "C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\x64\Debug\kernel.cu.obj" "C:\Users\mrakg\source\repos\Cutlass Tests\Cutlass Tests\kernel.cu"" exited with code 4. Cutlass Tests C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.3.targets 799
I am curious, what kind of setup are you using when you are developing these examples yourself? Even if not Visual Studio, how could I edit, play with and run these examples in something like VS Code?
You have to either ideally import cutlass as a dependency via cmake and that will result in a seamless integration without you having to manually edit flags etc OR copy paste the entirety of compiler command line and append to your project's builds.
I am not sure where'd I'd get the latter, but as for the first...
Since it is a header library, am I not supposed to simply add the include directories and then the necessary headers? The quickstart guide examples make it seem that is how it is supposed to be used. Right now, I cannot get even a Cuda hello world project to run with CMake on my own, so figuring out how to add Cutlass as a dependency that way seems very hard.
Instead of us going back and forth over this, and me asking dumb questions, would it be possible for you to provide some examples that somebody without intricate knowledge of C++'s build system would be able to run, please? Something like Nvidia does here. Just to even get started with Cutlass, I had to study CMake for a few hours and in the end, I couldn't even run a hello world Cuda program with it.
I mean it constructively when I say that you are hugely overestimating how easy it is for somebody to come in and use the Cutlass library right now.
here is an example: https://github.com/thakkarV/cuASR/blob/master/CMakeLists.txt
it is somewhat out of date at this point, but enough to get you started. The one thing I would change for this example is that rather than setting the flags manually, I would add directory for CUTLASS Cmake first and then simply append any custom flags to the ones that cutlass cmake file generates
I'll give that a try, but my request would be to get a Cutlass example to work as a plain VS project, similar to the ones on the NVidia sample repo. Maybe if we were doing a C++ project that utilizes Cutlass, depending on CMake would be fine, but in the Youtube channel we're doing the ML library project in Spiral which compiles to Python on the host, and Cuda C++ on the device. Including the Cutlass directory as a part of the NVCC compiler options is the limit of how complex the setup can be.
Before that, I need to get it to run in VS, so I can anchor the project to a working state and be capable of debugging it in the IDE.
I presume this should be possible, I just don't understand why I am getting those weird types errors in integral_constant.h
when I copies the second quickstart example into my project. One assumption is that the C++ standard is out of date, but I did set it to C++ 20 in the project settings. And when I copied the 00_basic_gemm
project instead, I am still getting those type errors, but I am also missing threadIdx.x
. That one should be in #include "cuda_runtime.h"
. I don't understand why that is not getting included anywhere. The test itself builds with Cmake, but cuda_runtime.h
is not being included in any file referenced by the main project file.
I am sorry for pestering you with this, but I am blocked and not sure how to proceed currently.
Did the example I linked not help? You do not have to use cmake yourself - you can just copy the flags that CUTLASS CMake generates on windows and use those in your project. I suspect you are likely missing some include path OR you are not building with NVCC.
Please do not copy code out of the markdown documentation as that is not guarded by CI. Anything from the example or test dir should just work.
Actually yes, MSVC should never be able to see threadIdx
during compilation. That should only exist in device code
Maybe @mhoemmen can help
You do not have to use cmake yourself - you can just copy the flags that CUTLASS CMake generates on windows and use those in your project.
I see, I didn't realize that is what you meant. Hmm, I'll have to figure out how to find those flags.
Ok, I see them. Not in the project you've linked me to, but in the CMake build of Cutlass that I did yesterday. All the examples have a bunch of .vcxproj
files in their directories, and they do run. I mean, the first one that I tested does, but I guess the other executables would as well.
I understand what I should be doing now.
The only issue that I have is that looking at the .vcxproj
files is overwhelming, there are a bunch of them, and each of them are at least 10kb of XML to go through.
I did a hello world C++ project with CMake yesterday, and unlike the build Cutlass examples, I realize now that particular project also has the Visual Studio .sln
file in its build folder as well. That allows me to open the project directly in the IDE. It's too bad it's missing in the Cutlass examples, as that would allow me to open and edit them directly in the IDE. Maybe there is a way to create them...
Do you have any idea why the Cutlass CMake build script isn't producing them?
It turns out you can open the .vcxproj
files directly in the IDE. So, the solution file not being there is not a problem.
It is actually possible to build the individual files from inside the IDE. Here are the build options for the first example.
>G:\cutlass-3.5.0\build\examples\00_basic_gemm>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu -I"G:\cutlass-3.5.0\include" -I"G:\cutlass-3.5.0\examples\common" -I"G:\cutlass-3.5.0\build\include" -I\include -I\examples -I"G:\cutlass-3.5.0\tools\util\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart shared -std=c++17 --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --expt-relaxed-constexpr -Xcompiler="/EHsc /Zc:__cplusplus /bigobj -Zi -Ob0 /wd4819 /fp:strict" -g -D_WINDOWS -DCUTLASS_VERSIONS_GENERATED -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fd00_basic_gemm.dir\Debug\vc143.pdb" -o 00_basic_gemm.dir\Debug\basic_gemm.obj "G:\cutlass-3.5.0\examples\00_basic_gemm\basic_gemm.cu"
The only issue with this is that for some reason the Intellisense isn't working properly for the examples even though they build correctly.
The solution to the Intellisense issue is to just take the directories in Cuda C/C++ -> Additional Include Directories
And then paste them into VC++ Directories -> Include Directories
like I just did here.
The only other problem is that it cannot recognize Cuda buildins like threadIdx
.
To resolve that you can add #include "device_launch_parameters.h"
at the top of the .cu
file.
I am surprised it is even compiling without that header. Anyway, it works, and I've met my goal of getting it to work on Windows and Visual Studio. I should be able to actually study the library properly from here on out.
The only nitpick that I still have is that for some reason opening the 00_basic_gemm.vcxproj
file opens literally every single project in the example folder as you can see in the screenshot above which defeats the point of them having individual project files. Maybe that could be something you should look into in the CMake files?
I'll close this here. Thank you for the help.
It's probably opening all the project files because instead of opening the individual project, it is instead running the solution file in the outer directory instead.
You can see it right here. This answers the question why the solution file is not being generated for the individual examples. It's because only a single one is being generated for the entire project.
The video is out now, though I think that just looking into this issue is easier to get a sense of what should be done. The sheer amount of stuff CMake produces made my brain freeze, so I did need some help to goad me in the right direction. Thanks again.
More than just getting started with it, I'll probably make a video on using Cutlass when the 50xx cards come out and need to upgrade the matrix multiply for the ML library.
@mrakgr Thank you for your interest and engagement!
https://github.com/NVIDIA/cutlass/blob/main/media/docs/build/building_in_windows_with_visual_studio.md gives instructions for running CMake and building on Windows, both from the command line. (I recommend the git bash
shell, but others should work as well.) You can even build a specific target, e.g., by replacing <TARGET_NAME>
in the command below with the path to the target's .vcxproj
file (not including the .vcxproj
extension).
cmake --build . --config Debug -j 4 -t <TARGET_NAME>
If I find myself needing an IDE, I just load up the CMake-generated CUTLASS.sln
file in Visual Studio.
Link: https://github.com/mrakgr/Cutlass-Tests/blob/632caad823bbb86eddb012488926030fd929f54e/Cutlass%20Tests/kernel.cu
I am trying to go through the intro examples and am stuck on the second one. I am sure that the two include paths are correct, but when I try to build the project there are a ton of errors.