gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

casper-hansen/AutoAWQ #579

hello, i have UserWarning: AutoAWQ could not load GEMM kerne…

/root/anaconda3/envs/chatglm3_v2/lib/python3.10/site-packages/awq/modules/linear/exllama.py:12: UserWarning: AutoAWQ could not load ExLlama kernels extension. Details: libcudart.so.12: cannot open sha…

wy200507030 updated 1 month ago
1
ARM-software/ComputeLibrary #1127

NEGEMMLowpMatrixMultiplyCore: set_pretranspose_A & set_pretr…

Model: ```mermaid graph TD; Input1["Input src1: fp32"] Quantise1["NEQuantizationLayer q_src1: QASYMM8_SIGNED"] Input2["Input src2: fp32"] Quantise2["NEQuantization…

eshoguli updated 4 days ago
1
NVIDIA/cutlass #1751

[QST]Why we have three GEMM in cutlass

**What is your question?** https://github.com/NVIDIA/cutlass/blob/f7b19de32c5d1f3cedfc735c2849f12b537522ee/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp#L477-L554 I underst…

ziyuhuang123 updated 1 week ago
1
ICLDisco/dplasma #119

Lookahead in GEMM is not good

The default lookahead in GEMM is 2, which is too small if running on multiple nodes, especially for large scaling. It should be related to the process grid.

QingleiCao updated 2 months ago
6
bd-iaas-us/vllm #21

[Feature]: Support flashdecoding++ double buffering and flat…

flashdecoding++ paper: https://arxiv.org/abs/2311.01282 - Q3 Collaboration Plan of Infra and IaaS Labs: https://bytedance.us.larkoffice.com/docx/HKXfdRh1noMrbAxcgL2ureGasdQ - FlashDecoding++ Sum…

chizhang118 updated 2 weeks ago
8
vllm-project/vllm #7400

[Bug]: Bug in quantization/awq /gemm_kernels.cu gemm_forward…

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug When N=64, we don't have 4*8=32 c_…

mengsoso updated 1 month ago
4
NVIDIA/cutlass #1663

[BUG] e4m3, int8, bf16 pytorch emitter not working

I am attempting to emit pytorch code but unfortunately it does not work for fp8, bf16, and int8. I have tried to patch the converter type dict https://github.com/OrenLeung/cutlass/commit/6d619c964eb8b…

OrenLeung updated 1 month ago
3
ROCm/composable_kernel #775

Compilation error for navi10 (use of undeclared identifier '…

Hello, I have some trouble to compile composable_kernel for my AMD GPU architecture (gfx1010) ``` cmake …

TyraVex updated 1 week ago
35
tinygrad/tinygrad #6378

image_dot of 2 half inputs returns in half instead of float

`PYTHONPATH="." GPU=1 IMAGE=0 python -m pytest test/test_ops.py -k test_gemm_fp16` passed `PYTHONPATH="." GPU=1 IMAGE=2 python -m pytest test/test_ops.py -k test_gemm_fp16` failed with `Exception: …

chenyuxyz updated 1 month ago
1
NVIDIA/cutlass #1724

[QST]Where is the PipelineState defined in cutlass/include/c…

**What is your question?** ![image](https://github.com/user-attachments/assets/98eab07b-1903-425e-9439-5178169c52e4) Like here, I see many usage of PipelineState but find no definition. I do find …

ziyuhuang123 updated 2 weeks ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for gemm

1000+ results
for gemm