-
In https://github.com/rust-lang/rust/pull/80652 I disabled non-power-of-two vector lengths to deconflict `stdsimd` and `cg_clif` development. Power-of-two vectors are typically sufficient, but there …
-
Hello! I'm trying to run [this AES implementation](https://github.com/cihangirtezcan/CUDA_AES) using the Accel-Sim framework in PTX mode. I have followed the instructions to add it as an app described…
-
-
**What is your question?**
when using serial splitK gemm to solve problem size of 1024x832x4096, there is a dead lock in semaphore wait.
when I Comment out this code,dead lock disappears,but it co…
-
Hitting `G` causes the application to crash
```bash
$ RUST_BACKTRACE=full mdt
Backtrace (most recent call first):
File "", line 0, in __libc_start_main
The application panicked (crashed).
…
-
I am trying to write a depthwise conv kernel. By reviewing the sample code 46, I find that the the the get_tiled_shape function in thread block swizzle class for depthwise is overloaded, and always o…
-
Currently during device startup we have to initialize all CBs even those that we don't use. If instead a CB was constructed on demand we could simplify kernel setup. Host-side CB configuration is also…
-
```
import taichi as ti
@ti.kernel
def foo():
for x in range(1000):
shared = ti.simt.block.SharedArray((10, ), dtype=ti.math.vec3)
print(shared[x].z) # Asserts
print(shar…
-
I'm building CUTLASS 3.2.2 on Arch Linux and I'm getting this error:
```
FAILED: tools/profiler/CMakeFiles/cutlass_profiler.dir/src/cudnn_helpers.cpp.o
/usr/bin/c++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTL…
-
**Submitting author:** @michel2323 (Michel Schanen)
**Repository:** https://github.com/exanauts/JuliaCon2020
**Branch with paper.md** (empty if default branch):
**Version:**
**Editor:** @matbesancon…