gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #1848

gemm_operation_profiler.cu

**What is your question?** I added split_k serial of cutlass 2.x to cutlass 3.x, slice_k as a parameter of problem_size. Now I want to use cutlass_profiler to test whether I should add a parameter to …

scm-later updated 1 week ago
1
libxsmm/libxsmm #902

Packed API for gemm

What does the l_r parameter represent in the libxsmm_create_packed_gemm function? [TEST](https://github.com/libxsmm/libxsmm/blob/main/samples/xgemm_packed/gemm_packed_kernel.c) I'm trying to understan…

IshitaShreya updated 3 days ago
10
converged-computing/performance-study #47

Missing result: CycleCloud CPU for mt-gemm

I didn't find Azure CycleCloud for CPU when parsing the mt-gemm results. For example, it should be in the plot here: https://github.com/converged-computing/performance-study/tree/main/analysis/mt-g…

vsoch updated 3 weeks ago
3
intel/intel-xpu-backend-for-triton #2379

Improve out-of-box performance for GEMM kernels variants

We have achieved good performance (relative to the XeTLA library) for a GEMM kernel (see http://benchmarks.glados.intel.com/d/1pXX4hUSz/microbenchmarks?orgId=1). Now is time to focus on improving per…

etiotto updated 4 days ago
2
intel/intel-xpu-backend-for-triton #1765

[GEMM perf] Poor GEMM performance on A770

When I run GEMM benchmark on A770 I get about ~`0.3 TFLOPs`, while 1550 can get about `250 TFLOPs` Performance table: ![image](https://github.com/user-attachments/assets/366947f8-82ce-4454-83ae-f…

Egor-Krivov updated 2 months ago
6
tensorflow/tensorflow #76632

How to determines which GEMM used in TensorFlow

### Issue type Bug ### Have you reproduced the bug with TensorFlow Nightly? Yes ### Source source ### TensorFlow version tf 2.15 ### Custom code Yes ### OS platform and distribution _No res…

nanzh-19 updated 6 days ago
1
vllm-project/vllm #8654

[Bug]: RuntimeError in gptq_marlin_24_gemm

### Your current environment python 3.8 L20*4 vllm 0.5.4 ### Model Input Dumps _No response_ ### 🐛 Describe the bug $python -m vllm.entrypoints.api_server --model='/mntfn/yanyi/Qwen2-…

leoyuppieqnew updated 2 weeks ago
5
intel/intel-xpu-backend-for-triton #2377

Assertion error on `gemm_splitk_benchmark.py`

USE_IPEX=0 python gemm_splitk_benchmark.py ``` /home/j…

etiotto updated 5 days ago
2
ROCm/rocWMMA #444

[Issue]: gemm tests failed in ROCM 6.2

### Problem Description I am investigating usage of instruction v_mfma_f32_16x16x16_f16 and nvidia equivalent warp-level mma (swizzle SRAM memory + ldmatrix registers + mma over registers, for Ampere…

yiakwy-xpu-ml-framework-team updated 1 week ago
3
NVIDIA/cutlass #1801

[BUG] TMA Cooperative GeMM with Stream-K scheduler hangs for…

# Describe the bug Gemm kernels with the following configurations hang for specific gemm shapes. - Type: `e4m3 x e4m3 -> bf16` - Tile: `256x32x128` - Cluster: `2x1x1` - Kernel Schedule: `KernelT…

Algy updated 2 weeks ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gemm

1000+ results
for gemm