gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #1589

[QST]error: too few arguments for class template "cutlass::e…

I run the example in the quick start guide. My GPU is A30, the command is `nvcc 01_gemm_3.0.cu -arch=sm_80` It complains errors: ``` 01_gemm_3.0.cu(51): error: too few arguments for class templ…

sleepwalker2017 updated 2 weeks ago
1
microsoft/superbenchmark #624

Add gemm-flops support of Ada Lovelace (L4, L40, L40s), comp…

**What's the issue, what's expected?**: I started superbenchmark on server with NVIDIA L40 and got error message "Unsupported architecture" from gemm-flops benchmark. L40 and L4 are CUDA-capable NVID…

avnf updated 4 days ago
1
oneapi-src/oneMKL #524

MKL FP16 GEMM crash on MTL iGPU

# Summary I found on MTL iGPU, if I call FP16 gemm of onemkl (no matter using OneAPI 2024.0 or 2024.2), the program will crash, and if I call it many times, it will cause my machine to freeze direc…

rnwang04 updated 3 weeks ago
2
NVIDIA/TensorRT-LLM #1792

Fail to build w4a8_awq/int4_awq on Llama3-8B

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.11.0.dev2024052100 nvidia L40s ### Who can help? …

Hongbosherlock updated 3 weeks ago
8
NVIDIA/CUDALibrarySamples #192

SplitK for multiblock_gemm in cuBLASdx

Hello! I am currently learning CUTLASS and cuBLASdx and I have a question. `multiblock_gemm.cu` only allows K that fits in smem. I believe it can be extended to larger K following the splitK patter…

osayamenja updated 1 month ago
1
llvm/llvm-project #94537

llvm testsuite doesn't compile with -Oz

Using -Oz.cmake $ cmake -DCMAKE_C_COMPILER=/usr/local/llvm-project/build/bin/clang-19 -C ../cmake/caches/Oz.cmake .. && make -j32 -k ```log [ 59%] Building CXX object SingleSource/UnitTests…

hiraditya updated 1 month ago
7
NVIDIA/TensorRT-LLM #1922

Support int type zero-points in weight-only GEMM

Currently some quantized huggingface models save zero-points in int4 datatype directly, like [Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4) and [Qwen/Qwen2…

xiaonans updated 2 weeks ago
3
ROCm/AMDMIGraphX #3261

Integrate codegen API for CK gemm-multiple-d

turneram updated 2 weeks ago
1
ROCm/AMDMIGraphX #2813

Fuse GEMM across a reshape

From the 22 Feb 2024 performance model review of Distilgpt2: There are several cases of dot+reshape+pointwise: ``` @47 = gpu::code_object[code_object=5688,symbol_name=mlir_reshape_dot,global=67…

CharlieL7 updated 1 day ago
2
TiledTensor/ThrillerFlow #6

Build and codegen for `Back2Back GEMM` dataflow.

**Back2Back GEMM** is an important kernel, and it is the core of **flash attention**, so it is necessary to analyze its dataflow and generate it with the help of the dataflow.

KuangjuX updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gemm

1000+ results
for gemm