LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

Add CMake flag for pipeline parallelism for multi-GPU #940

Closed Nexesenex closed 1 week ago

Nexesenex commented 1 week ago

LCPP Default is set to 4, which is a bit too much in my opinion. Setting to 2 saves VRAM (0.5-1%?), some compute and some electricity if set to 2, at the expense of some potential performance (prompt processing?), that I do not notice in usage. 2 is thus my own setting.

https://github.com/ggerganov/llama.cpp/pull/6017