-
both input A and B are in fp8, and the output is fp16.
Or a fused one, input A with fp16 dtype and A scale with float32 dtype, B in fp8, the kernel quantize A into fp8 and then invoke fp8 gemm to …
-
![image](https://github.com/HandH1998/QQQ/assets/27813268/5a1fadcf-e474-4a73-8f38-29e9eafe7b7b)
Why is `W4A8` faster than `W8A8`? `W4A8` needs some additional operations before performing `INT8…
-
This is the GEMM Performance features productization umbrella ticket. Before converting this ticket umbrella ticket, please:
- Add the step-by-step GEMM Performance features productization plan here.…
-
In `SplinesLinearProblem2x2Blocks::solve()`, two `gemm` operations are performed, involving bottom-left and top-right corners. Those are stored as dense matrix (2D Kokkos::View). However, the bottom-l…
-
enable feature- streamK or splitK
-
It seems latest [OpenBlas](https://github.com/OpenMathLib/OpenBLAS) supports bfloat16 GEMM.
I guess upgrading openblas version from [here](https://github.com/nnstreamer/nnstreamer-android-resource/tr…
-
hi, I noticed there is a question ask about 2 years ago? Is it implemented now?
[Two tile gemm enabling for MI250](https://github.com/ROCm/rocBLAS/issues/1263)
"Is there a way to make gemm run on …
-
All gemm in flash attention (inlcude forward & backward), input is fp16/bf16 (include left matrax & right matrax), output is fp32?
-
### Describe the issue
when i use gemm_float8 to run with input A(fp8 e5m2), input B(fp8 e4m3), can not run, but input A(fp8 e4m3), input B(fp8 e4m3) will run right,
### To reproduce
run gemm_floa…
-
### Describe the bug
Build fails during final shared lib linking.
### To Reproduce
Steps to reproduce the behavor:
1. build rocblas version 6.0.2 with export ROCM_GPUS="gfx803;gfx900;gfx906:xnac…
aagit updated
8 hours ago