Closed Jake36921 closed 3 months ago
The GPU usage displayed only reflects the usage of the 3D engine; it does not show the utilization of AI acceleration computations. GPU load can be inferred by observing changes in VRAM usage and GPU temperature.
Additionally, your GPU memory usage exceeds the maximum limit of 12GB, which can result in particularly slow calculations and make it look like the GPU is not working.
@Jake36921 did you select Use CuBLAS option and offload GPU layers? What model are you using?
@Jake36921 did you select Use CuBLAS option and offload GPU layers? What model are you using?
Yes, I did both of them and I was using Merged-RP-Stew-V2-34B.i1-IQ3_S.gguf. Idk if task manager is reporting the wrong values or something but it reports that the gpu is being used on different models like mythomax-l2-13b.
I mean, I can see your GPU VRAM being filled. So it probably is working correctly.
Still weird that task manager reports that its barely being used but I can see that it works during text generation after checking temps on msi afterburner.
The gpu is barely being used during generation.
Arguments: Start koboldcpp --threads 4 --launch --usecublas 0 0 --gpulayers 31 --contextsize 8192 --usemlock --blasbatchsize 1024 --multiuser --port 5002
Model: Merged-RP-Stew-V2-34B.i1-IQ3_S.gguf
System: cpu: 5600g ram: 32gb gpu: 3060 12gb