-
I'm learning the tutorial about sgemm_1.cu here: https://github.com/NVIDIA/cutlass/blob/v3.5.0/examples/cute/tutorial/sgemm_1.cu
My question is that:
How can we know the output of C?
I see in b…
-
This issue is not in response to a performance regression.
The method of performing cross-attention QKV computations introduced in #4942 could be improved. Because this issue relates to cross-atten…
-
From the 22 Feb 2024 performance model review of Distilgpt2:
There are several gemms that are applied together(this is the tailend of attention):
```
@17 = hip::hip_copy_literal[id=main:@litera…
-
Platforms: rocm
This test was disabled because it is failing in ROCm6.1 (eg. https://github.com/pytorch/pytorch/pull/132895)
cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jatayl…
-
Platforms: rocm
This test was disabled because it is failing on main branch ([recent examples](https://torch-ci.com/failure?failureCaptures=%5B%22inductor%2Ftest_b2b_gemm.py%3A%3AB2BGEMMTest%3A%3At…
-
# ❓ Questions and Help
Some features appear to be unavailable when executing 'python -m xformers.info' (cutlassF, smallkF, ...)
Is this normal?
```
xFormers 0.0.27+7a04357.d20240822
memory_ef…
-
**Describe the bug**
The Python pytorch emitter does not output functioning code when compiling `Gemm` with an `EVT`.
**Steps/Code to reproduce bug**
The script below reproduces the bug.
Sw…
-
Platforms: rocm
This test was disabled because it is failing on main branch ([recent examples](https://torch-ci.com/failure?failureCaptures=%5B%22inductor%2Ftest_b2b_gemm.py%3A%3AB2BGEMMTest%3A%3At…
-
Hello, could I use `FastLinearCombinationClamp` to convert `half_t` accumulator to `int8_t` output? or it only supports `int32_t` accumulator to `int8_t` output? Thanks!
```c++
using ElementInputA…
-
从零開始学习OpenCL开发(一)架构 - yxwkaifa - 博客园
https://www.cnblogs.com/yxwkf/p/4552029.html
ysh329/OpenCL-101: Learn OpenCL step by step.
https://github.com/ysh329/OpenCL-101