gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #1659

[QST]question about the `sgemm_1.cu`

I'm learning the tutorial about sgemm_1.cu here: https://github.com/NVIDIA/cutlass/blob/v3.5.0/examples/cute/tutorial/sgemm_1.cu My question is that: How can we know the output of C? I see in b…

sleepwalker2017 updated 1 week ago
1
vllm-project/vllm #7397

[Misc]: Cross-attention QKV computation is inefficient

This issue is not in response to a performance regression. The method of performing cross-attention QKV computations introduced in #4942 could be improved. Because this issue relates to cross-atten…

afeldman-nm updated 3 weeks ago
1
ROCm/AMDMIGraphX #2818

GEMM fusion (over slice or not)

From the 22 Feb 2024 performance model review of Distilgpt2: There are several gemms that are applied together(this is the tailend of attention): ``` @17 = hip::hip_copy_literal[id=main:@litera…

CharlieL7 updated 1 month ago
1
pytorch/pytorch #134143

DISABLED test_b2b_gemm_trivial_right_assoc_good_shape (__mai…

Platforms: rocm This test was disabled because it is failing in ROCm6.1 (eg. https://github.com/pytorch/pytorch/pull/132895) cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jatayl…

jithunnair-amd updated 2 weeks ago
1
pytorch/pytorch #133311

DISABLED test_b2b_gemm_right_assoc_good_shape (__main__.B2BG…

Platforms: rocm This test was disabled because it is failing on main branch ([recent examples](https://torch-ci.com/failure?failureCaptures=%5B%22inductor%2Ftest_b2b_gemm.py%3A%3AB2BGEMMTest%3A%3At…

jataylo updated 3 weeks ago
1
ROCm/xformers #21

Some features (cutlassF, smallkF, ...) appear to be unavaila…

# ❓ Questions and Help Some features appear to be unavailable when executing 'python -m xformers.info' (cutlassF, smallkF, ...) Is this normal? ``` xFormers 0.0.27+7a04357.d20240822 memory_ef…

Zars19 updated 2 weeks ago
1
NVIDIA/cutlass #1462

[BUG] Python `EVT` `Pytorch` Emitter Broken

**Describe the bug** The Python pytorch emitter does not output functioning code when compiling `Gemm` with an `EVT`. **Steps/Code to reproduce bug** The script below reproduces the bug. Sw…

jeromeku updated 1 week ago
4
pytorch/pytorch #133233

DISABLED test_b2b_gemm_left_assoc_good_shape (__main__.B2BGE…

Platforms: rocm This test was disabled because it is failing on main branch ([recent examples](https://torch-ci.com/failure?failureCaptures=%5B%22inductor%2Ftest_b2b_gemm.py%3A%3AB2BGEMMTest%3A%3At…

jataylo updated 3 weeks ago
1
NVIDIA/cutlass #1516

[QST] use FastLinearCombinationClamp to convert half accumul…

Hello, could I use `FastLinearCombinationClamp` to convert `half_t` accumulator to `int8_t` output? or it only supports `int32_t` accumulator to `int8_t` output? Thanks! ```c++ using ElementInputA…

hychiang-git updated 1 week ago
2
yuenshome/yuenshome.github.io #40

gpu gemm optimize

从零開始学习OpenCL开发（一）架构 - yxwkaifa - 博客园 https://www.cnblogs.com/yxwkf/p/4552029.html ysh329/OpenCL-101: Learn OpenCL step by step. https://github.com/ysh329/OpenCL-101

ysh329 updated 5 years ago
4

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results
for gemm

[QST]question about the `sgemm_1.cu`

[Misc]: Cross-attention QKV computation is inefficient

GEMM fusion (over slice or not)

DISABLED test_b2b_gemm_trivial_right_assoc_good_shape (__mai…

DISABLED test_b2b_gemm_right_assoc_good_shape (main.B2BG…

Some features (cutlassF, smallkF, ...) appear to be unavaila…

[BUG] Python `EVT` `Pytorch` Emitter Broken

DISABLED test_b2b_gemm_left_assoc_good_shape (main.B2BGE…

[QST] use FastLinearCombinationClamp to convert half accumul…

gpu gemm optimize

1000+ results for gemm

1000+ results
for gemm