running a 70b model - Githubissues

LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

GNU Affero General Public License v3.0

4.35k stars 312 forks source link

Hi, Are there any special settings for running large models > 70B parameters on a PC low an memory and VRAM.

PC memory - 32GB VRAM - 12GB Model quantization - 5bit (k quants) (additional postfixes K_M) Model parameters - 70b

I tried it with Kobold cpp regular version (not the cuda one), and it showed close to 99% memory usage and high hdd usage. The model file is save on a ssd. After generated a few tokens 10 - 20 it just froze.

I'm sure the output would be slow maybe < 0.5 tokens /sec, but just wondering if there is a way to get it to work, by tweaking some settings in KoboldCpp.

LostRuins / koboldcpp

running a 70b model #858