gemm Search Results - Githubissues

microsoft/BitBLAS #82

Is fp8 quantization gemm supported?

both input A and B are in fp8, and the output is fp16. Or a fused one, input A with fp16 dtype and A scale with float32 dtype, B in fp8, the kernel quantize A into fp8 and then invoke fp8 gemm to …

sleepwalker2017 updated 13 hours ago

HandH1998/QQQ #3

[QST] Speedup of GEMM

![image](https://github.com/HandH1998/QQQ/assets/27813268/5a1fadcf-e474-4a73-8f38-29e9eafe7b7b) Why is `W4A8` faster than `W8A8`? `W4A8` needs some additional operations before performing `INT8…

Hongbosherlock updated 1 day ago

intel/intel-xpu-backend-for-triton #1450

[Productize GEMM Performance] Features

This is the GEMM Performance features productization umbrella ticket. Before converting this ticket umbrella ticket, please: - Add the step-by-step GEMM Performance features productization plan here.…

vlad-penkin updated 1 day ago

CExA-project/ddc #524

Optimizing gemm usage in splines

In `SplinesLinearProblem2x2Blocks::solve()`, two `gemm` operations are performed, involving bottom-left and top-right corners. Those are stored as dense matrix (2D Kokkos::View). However, the bottom-l…

blegouix updated 2 days ago

intel/intel-xpu-backend-for-triton #1104

[#6 GEMM Performance] enable stream K for gemm

enable feature- streamK or splitK

Dewei-Wang-sh updated 2 days ago

nnstreamer/nntrainer #2668

Using Bfloat16 GEMM from OpenBlas

It seems latest [OpenBlas](https://github.com/OpenMathLib/OpenBLAS) supports bfloat16 GEMM. I guess upgrading openblas version from [here](https://github.com/nnstreamer/nnstreamer-android-resource/tr…

skykongkong8 updated 1 day ago

ROCm/rocBLAS #1453

[Feature]: Two tile gemm enabling for MI250

hi, I noticed there is a question ask about 2 years ago? Is it implemented now? [Two tile gemm enabling for MI250](https://github.com/ROCm/rocBLAS/issues/1263) "Is there a way to make gemm run on …

Alice1069 updated 44 minutes ago

Dao-AILab/flash-attention #703

Type of gemm.

All gemm in flash attention (inlcude forward & backward), input is fp16/bf16 (include left matrax & right matrax), output is fp32?

gaodaheng updated 10 hours ago

microsoft/onnxruntime #20869

Gemm fp8 run error

### Describe the issue when i use gemm_float8 to run with input A(fp8 e5m2), input B(fp8 e4m3), can not run, but input A(fp8 e4m3), input B(fp8 e4m3) will run right, ### To reproduce run gemm_floa…

KnightYao updated 6 days ago

ROCm/rocBLAS #1448

[Bug]: rocblas link fails with relocation R_X86_64_PC32 out …

### Describe the bug Build fails during final shared lib linking. ### To Reproduce Steps to reproduce the behavor: 1. build rocblas version 6.0.2 with export ROCM_GPUS="gfx803;gfx900;gfx906:xnac…

aagit updated 8 hours ago

1000+ results for gemm

1000+ results
for gemm