-
I run the example in the quick start guide.
My GPU is A30, the command is `nvcc 01_gemm_3.0.cu -arch=sm_80`
It complains errors:
```
01_gemm_3.0.cu(51): error: too few arguments for class templ…
-
**What's the issue, what's expected?**:
I started superbenchmark on server with NVIDIA L40 and got error message "Unsupported architecture" from gemm-flops benchmark. L40 and L4 are CUDA-capable NVID…
-
# Summary
I found on MTL iGPU, if I call FP16 gemm of onemkl (no matter using OneAPI 2024.0 or 2024.2), the program will crash, and if I call it many times, it will cause my machine to freeze direc…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.11.0.dev2024052100
nvidia L40s
### Who can help?
…
-
Hello!
I am currently learning CUTLASS and cuBLASdx and I have a question. `multiblock_gemm.cu` only allows K that fits in smem. I believe it can be extended to larger K following the splitK patter…
-
Using -Oz.cmake
$ cmake -DCMAKE_C_COMPILER=/usr/local/llvm-project/build/bin/clang-19 -C ../cmake/caches/Oz.cmake .. && make -j32 -k
```log
[ 59%] Building CXX object SingleSource/UnitTests…
-
Currently some quantized huggingface models save zero-points in int4 datatype directly, like [Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4) and [Qwen/Qwen2…
-
-
From the 22 Feb 2024 performance model review of Distilgpt2:
There are several cases of dot+reshape+pointwise:
```
@47 = gpu::code_object[code_object=5688,symbol_name=mlir_reshape_dot,global=67…
-
**Back2Back GEMM** is an important kernel, and it is the core of **flash attention**, so it is necessary to analyze its dataflow and generate it with the help of the dataflow.