NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.68k stars 976 forks source link

[QST] How to import C++ CUTLASS for pybind? #1844

Closed haeunlee99 closed 1 month ago

haeunlee99 commented 1 month ago

I want to use CUTLASS functions with pytorch tensors in python. I have before used pybind to compile CUDA programs that can be called from python. However, it seems like CUTLASS requires CMake for compilation and import (https://github.com/NVIDIA/cutlass/tree/main/examples/60_cutlass_import). I barely know CMake. Is using CMake as in the example and using CMakeExtension for setup.py (https://github.com/pybind/cmake_example/blob/master/setup.py) the only way or is there something I am missing?

I have added "--gpu-architecture=sm_90a" as nvcc flag to compile pybind just as before (by directly including source code as header file) but saw severe degration in performance. (85 TFLOPS with half precision matmul in hopper GPU.) Can wrong compilation degrade performance as well?

from setuptools import setup from torch.utils import cpp_extension

extra_compile_args = { "nvcc": [ "--gpu-architecture=sm_90a", ], }

setup( name="test_ext", ext_modules=[ cpp_extension.CUDAExtension( name="test_ext", sources=["test.cu"], extra_compile_args=extra_compile_args ) ], cmdclass={"build_ext": cpp_extension.BuildExtension}, )

Thanks!

jackkosaian commented 1 month ago

Try taking a look at the Python example 02_pytorch_extension_grouped_gemm. It emits a setup.py file that can be used with PyTorch. You can follow the example in that file.

haeunlee99 commented 1 month ago

Thank you!