LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

CUDA: fix MMQ stream-k for --split-mode row #8167 #948

Closed Nexesenex closed 2 days ago

Nexesenex commented 5 days ago

Courtesy of Johannes Gaessler, as usual. PR came after refactor, but I think that it's important to have it merged now for those using split row.

https://github.com/ggerganov/llama.cpp/pull/8167

LostRuins commented 2 days ago

Will merge the commit from upstream directly