LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.81k stars 343 forks source link

Support for MPI? #348

Closed Jakub-S-K closed 1 year ago

Jakub-S-K commented 1 year ago

Is it possible to support MPI? It's implemented in upstream but I didn't see any traces of it working here. Specifically it will allow spliting larger model into parts that could be individually loaded into each workers individual gpu's and then accelerate greatly token generation of larger models. Have anyone tested it with koboldcpp and is it possible to run?

LostRuins commented 1 year ago

Multi device inference is probably out of scope for KoboldCpp, but multi GPU inference is possible in CuBLAS mode, you can configure with --tensor_split

Jakub-S-K commented 1 year ago

Sure then I'll try working directly with llama.cpp