gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

adeeconometrics/MeshLines #3

Matmul benchmarks

Experiment with different implementations of Matmul: - [x] Vanilla Matmul implementation - [x] Vanilla Matmul with I/O optimized - [x] GEMM (blocked matrix) - [x] Threaded GEMM - [x] GEMM on NEON…

adeeconometrics updated 2 months ago
1
OpenMathLib/OpenBLAS #4742

Parameter adjustment of [CZ]GEMM_DEFAULT_[PQ] for Neoverse V…

Hello. In the previous pull request #4381, the P and Q parameter of [SD]GEMM were increased to make better use of the L2 cache of Neoverse V1, but the complex [CZ]GEMM parameters remained unchanged. …

tetsuzo-usui updated 1 month ago
1
triton-lang/triton #2513

Understanding Triton GEMM FP8 performance

Hello, we have measured the FP8 GEMM performance using Triton on NVIDIA H100 (500 W, 1980 MHz). We would like to request your help in understanding if the performance is expected. Since H100 FP8 o…

sryap updated 2 months ago
14
oneapi-src/oneMKL #542

When trying to use oneMKL with the portBLAS backend there is…

# Summary When trying to use oneMKL with the portBLAS backend, the current code structure checks for an Intel, AMD or NVidia GPU, which if not found causes an unsupported error. It is understood that…

al3x-jp updated 3 days ago
1
ROCm/AMDMIGraphX #2818

GEMM fusion (over slice or not)

From the 22 Feb 2024 performance model review of Distilgpt2: There are several gemms that are applied together(this is the tailend of attention): ``` @17 = hip::hip_copy_literal[id=main:@litera…

CharlieL7 updated 2 days ago
1
dmlc/dgl #7404

Build fails: error: duplicate symbol: libxsmm_verbosity

## 🐛 Bug ``` ld: warning: multiple common of .gomp_critical_user_.var ld: error: duplicate symbol: libxsmm_verbosity >>> defined at libxsmm_generator.c:31 (/usr/ports/math/dgl/work/dgl-2.2.1/thi…

yurivict updated 1 month ago
1
JuliaStats/PDMats.jl #202

gemm! alias

love the package! `BLAS.gemm!` fails for any `PDMat` arguments unless you pass `a.mat`. Maybe something like could be more general: ```Julia pd_gemm!(tA, tB, alpha, A, B, beta, C) = BLAS.ge…

harrisonritz updated 6 months ago
5
foundation-model-stack/foundation-model-stack #229

Optimized FP8 GEMM Kernel [MEGATHREAD]

SOTA (CUBLAS, CUTLASS) FP8 GEMM kernels are performing poorly for small M (bs*seq_len) < 32 regime. This work will focus on leveraging the performant pieces of the [Marlin](https://github.com/IST-D…

AdnanHoque updated 3 weeks ago
1
NVIDIA/cutlass #1603

[QST] cutlass fails during tensorflow assembly

cutlass is used in building kernels in tensorflow I took a look in cutlass_archive/include/cutlass/matrix.h and indeed set_slice3x3 is not defined however set_slice_3x3 is did not want to submi…

zkbitcoin updated 15 hours ago
6
vllm-project/vllm #6378

[RFC]: A Graph Optimization System in vLLM using torch.compi…

### Motivation. At a high level, we at Neural Magic are writing a custom compiler for Torch Dynamo to define a system within vLLM where we can write graph transformations. The main goal is a separa…

bnellnm updated 2 weeks ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gemm

1000+ results
for gemm