-
CUDA Version 11.2
Cutlasa Version 2.9
GPU A10
cmake .. -DCUTLASS_NVCC_ARCHS=86 -DCUTLASS_LIBRARY_KERNELS=cutlass_simt_sgemm_128x128_8x2_nn_align1
make cutlass_profiler -j16
./tools/profile…
-
I tried to train Voxel-RCNN using Uni-3D as instructed in readme files, but encountered the following error:
```
Exception|implicit_gemm]feat=torch.Size([531550, 32]),w=torch.Size([32, 3, 3, 3, 32…
-
I'm implementing a small-k gather-gemm-scatter fusion kernel using simt. All things go right in gather and accumulation stage. However, incorrect result returns after scattering (I use native `OutputT…
-
**Is your feature request related to a problem? Please describe.**
Currently, conv3d kernels like those in [default_conv3d_fprop](https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/ke…
-
Has anyone encountered this porblem when using TensorRT to do inference?
[07/04/2023-10:04:52] [I] Starting inference
[07/04/2023-10:04:53] [E] [TRT] /home/vision/Desktop/sparseconv_trt_new_libs…
-
~~Hi, I cannot find any information about the support for bf16 and fp16. Does the current library support any of them?~~
I found the information in the catalog. Then when is bf16 going to be suppor…
-
The CUDA matmul SIMT pipeline uses `scf.foreach_thread` and performs vectorization before bufferization. This is the direction we should follow for SPIR-V side too. So creating this issue to track it.…
-
```
win11 wsl2 cuda-11.8
2023-10-28 install from source code , Taichi-1.7.0
(cuda118) root@LZH5:/mnt/d/Software/AI/taichi/2310/wsl_cuda118# python setup.py develop -DCLANG_EXECUTABLE=/usr/bin…
-
The title says well what happens, and I think most of us know that this happens, but here is an example anyway. Here is a loop that computes a 5x5 convolution:
```cpp
produce denoise_conv_noisy {
…
-
**What is your question?**
I am a newbee in cutlass, and I'm trying to run the `examples/06_splitK_gemm` on Nvidia A100. I made modifications based on issue #1141, including changing SmArch to cutlas…