Add CMake flag for pipeline parallelism for multi-GPU

LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.34k stars 310 forks source link

Add CMake flag for pipeline parallelism for multi-GPU #940

Closed Nexesenex closed 1 week ago

Nexesenex commented 1 week ago

LCPP Default is set to 4, which is a bit too much in my opinion. Setting to 2 saves VRAM (0.5-1%?), some compute and some electricity if set to 2, at the expense of some potential performance (prompt processing?), that I do not notice in usage. 2 is thus my own setting.

https://github.com/ggerganov/llama.cpp/pull/6017

[x] I have read the contributing guidelines
Self-reported review complexity:
- [x] Low
- [ ] Medium
- [ ] High