-
Kind of a small change.
So I was looking at
https://github.com/NVIDIA/cutlass/issues/1231
and I was wondering if it made sense to refactor the code so that it will accept the type of copy they w…
-
**Describe the bug**
A clear and concise description of what the bug is.
it is build error
**To Reproduce**
Steps to reproduce the behavior:
following the guide to make libnvm on ubuntu24.04
**E…
-
![Snipaste_2023-12-23_13-30-41](https://github.com/lcai2/LightTEA/assets/76097676/c577f04d-3b99-4731-b5b5-9ca88e3d02db)
for i in range(ranks.shape[0]):
if sims[i,ranks[i,0]] > 0.8:
…
w-oo updated
10 months ago
-
#15 のような感じ = 機械学習最適化JITコンパイラの工夫について書いてある、#15より進んでいるのかな
高コストな融合をするか、融合をしないで大量のカーネルを出すかというジレンマがある(ジャストインタイム制約があるので)
AStichという最適化コンパイラを作った、Tensorflowから使うらしい。
Stitchは、4つのオペレータステッチングスキームを体系的に抽象化し、…
-
# 🐛 Bug
## Command
```sh
cd xformers
git pull
git submobule update --recursive --remote
pip install -e .
```
## To Reproduce
Steps to reproduce the behavior:
1. pull latest…
-
Hi, I built **cumm** from source on Nvidia Jetson nano board, when I import **cumm** inside python, no errors appear.
But when import **TensorOpParams** as following: `from cumm.gemm.algospec.core im…
-
Thread local memory pointer is really only a thing on CUDA iirc. Hardware wise all thread local memory are in the registers, as the first level of cache will be the shared memory. From the code it see…
-
Šis basecamp ir atzīmēts kā izdarīts. Lapā izmaiņas neredzu.
-
```
I am new to clpp. Just want to know what is the difference between these two
scans?
[I did not find a better place to put this question]
```
Original issue reported on code.google.com by `rong…
-
When I build the latest cutlass library for 90a, I see a lot of warnings like:
```
ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient r…