-
### What happened?
Hitting this error in CUDA compilation:
```
error: 'func.func' op exceeded GPU memory limit of 166912 bytes for function. Got 5767168 bytes
```
[Full error log](https://gis…
-
I try to follow the guidance of https://github.com/NVIDIA/cutlass/blob/master/media/docs/profiler.md to compile the cutlass profiler, But I get stuck trying to execute:
```shell
$ make cutlass_pro…
-
**Is your feature request related to a problem? Please describe.**
There are many algorithms for gemms and convs that are designed specifically for TensorOps. For example, any of the algorithms that …
-
**What is your question?**
If I understand correctly, rank_k kernels are by default for sm_80 and newer devices. And it seems I cannot run any rank_k opterations with cutlass_profiler on a T4 device.…
-
# Project Picasso - a multithreading runtime for Nim
_"Good artists borrow, great artists steal." -- Pablo Picasso_
## Introduction
The Nim destructors and new runtime were introduced
to pro…
-
Hi,
I want to implement int8 complex GemmBatched for my project to run on sm70 device.(uint8 * uint8 = uint32)
May I ask what's the best way to do it?
-
**Describe the bug**
I am trying to do a gemm between two fp32 arrays using the python api to produce a fp32 output. I would like to leverage tensor cores for this operation.
I modified the the …
-
Hello! I write a custom simt kernel to do gather-gemm-scatter fusion. The profiler picks the kernel settings. But I find it will give the wrong result for gather-gemm-scatter. Does the simt kernel sup…
-
Hi, I'm interested in using circle's metaprogramming tools to extend an existing codebase that makes use of SIMD intrinsics, but it seems to be failing to compile. A small example [here](https://godbo…
-
I initialized a plane named AC1 with 250 m/s speed.But it is ...
![myplot](https://user-images.githubusercontent.com/84360925/218401691-aeb47996-4125-484a-bcf6-d06ef5049075.png)