LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.41k stars 318 forks source link

Very low gpu usage #773

Closed Jake36921 closed 3 months ago

Jake36921 commented 3 months ago

The gpu is barely being used during generation.

Screenshot 2024-04-11 112649

Screenshot 2024-04-11 110935

Screenshot 2024-04-11 110947

Arguments: Start koboldcpp --threads 4 --launch --usecublas 0 0 --gpulayers 31 --contextsize 8192 --usemlock --blasbatchsize 1024 --multiuser --port 5002

Model: Merged-RP-Stew-V2-34B.i1-IQ3_S.gguf

System: cpu: 5600g ram: 32gb gpu: 3060 12gb

liuyunrui123 commented 3 months ago

The GPU usage displayed only reflects the usage of the 3D engine; it does not show the utilization of AI acceleration computations. GPU load can be inferred by observing changes in VRAM usage and GPU temperature.

liuyunrui123 commented 3 months ago

Additionally, your GPU memory usage exceeds the maximum limit of 12GB, which can result in particularly slow calculations and make it look like the GPU is not working.

LostRuins commented 3 months ago

@Jake36921 did you select Use CuBLAS option and offload GPU layers? What model are you using?

Jake36921 commented 3 months ago

@Jake36921 did you select Use CuBLAS option and offload GPU layers? What model are you using?

Yes, I did both of them and I was using Merged-RP-Stew-V2-34B.i1-IQ3_S.gguf. Idk if task manager is reporting the wrong values or something but it reports that the gpu is being used on different models like mythomax-l2-13b.

LostRuins commented 3 months ago

I mean, I can see your GPU VRAM being filled. So it probably is working correctly.

Jake36921 commented 3 months ago

Still weird that task manager reports that its barely being used but I can see that it works during text generation after checking temps on msi afterburner.