-
Assume that there are 4x4 elements in the shared memory. I can use composition(Swizzle{},...) to swizzle each element successfully,
But now I want to swizzle in unit of 2x2 elements, just like :
…
-
### Reference issues
#20
### Summary
Want to support sprites of different sizes (e.g., 8x8 and 32x32).
### Basic examples
The VGA resolution is 640x480. For 32x32 sprites, the screen wo…
-
I am trying to use `cub::DeviceHistogram::HistogramEven` with `CounterT=int64_t` and get the following error:
```
cub/agent/agent_histogram.cuh(370): error: no instance of overloaded function "ato…
-
Hi @66RING, thank you for your helpful work.
I have one question about the use of kBlockKSmem in csrc/kernel_traits.h. When you define SmemLayoutAtomQ:
```
using SmemLayoutAtomQ = decltype(
…
-
discovered while blackification - the layering used there is certain trouble
-
Currently, some of the shared memory kernels are used with the "debug" backend (typically enabled with option `-d debug`) and that leads to errors. One example of this is here: https://github.com/mfem…
-
Hello!
I am currently learning CUTLASS and cuBLASdx and I have a question. `multiblock_gemm.cu` only allows K that fits in smem. I believe it can be extended to larger K following the splitK patter…
-
My code:
```
using GmemTiledCopyL = decltype(make_tiled_copy(
Copy_Atom{}, Layout{}, Layout{}));
using SmemLayoutL = decltype(Layout{});
__shared__ cute::array_aligned l;
GmemTiledCopyL …
-
https://www.selenic.com/smem/
-
In version 9.6 changes were made to smem so that there can be several distinct instances of an LTI in working memory at a time. Each instance has a hidden link to the LTI in smem that it was retrieved…