LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.14k stars 353 forks source link

CPU video core #253

Closed Vladonai closed 1 year ago

Vladonai commented 1 year ago

Platform:0 Device:0 - NVIDIA CUDA with NVIDIA GeForce RTX 3050 Platform:1 Device:0 - Intel(R) OpenCL HD Graphics with Intel(R) UHD Graphics 770 64Gb RAM

Does it make sense to use the video core of the processor as well? And if so, what is the best way to do it?

gustrd commented 1 year ago

I had a slight improvement when using the integrated video card with OpenCL. But the gain from a RTX will be better.

With CUDA I got better results also, but it's not compatible with integrated graphics.

LostRuins commented 1 year ago

Koboldcpp supports only one GPU currently, so definitely use your most powerful graphics card. For your system, that seems to be the RTX 3050 --useclblast 0 0

Vladonai commented 1 year ago

Can you advise me if I can put more optimal values?

koboldcpp.exe --useclblast 0 0 --smartcontext --contextsize 2048 --blasbatchsize 512 --gpulayers 24 --usemirostat 2 5.0 0.1 --threads 10 --highpriority --stream --model model-33B.ggmlv3.q3_K_L.bin

Questions: My CPU has 6 physical cores (12 logical cores). What is the optimal value for --threads? --blasthreads now equals --threads. Should I change it for my graphics card? (RTX 3050, 8Gb) --blasbatchsize is now 512. Given my configuration, is this an optimal value? Or is it model specific?

Any advice is welcome. Basically, the 33B is already running pretty fast, but I'd like to squeeze the most out of it.

7erminalVelociraptor commented 1 year ago

Can you advise me if I can put more optimal values?

koboldcpp.exe --useclblast 0 0 --smartcontext --contextsize 2048 --blasbatchsize 512 --gpulayers 24 --usemirostat 2 5.0 0.1 --threads 10 --highpriority --stream --model model-33B.ggmlv3.q3_K_L.bin

Questions: My CPU has 6 physical cores (12 logical cores). What is the optimal value for --threads? --blasthreads now equals --threads. Should I change it for my graphics card? (RTX 3050, 8Gb) --blasbatchsize is now 512. Given my configuration, is this an optimal value? Or is it model specific?

Any advice is welcome. Basically, the 33B is already running pretty fast, but I'd like to squeeze the most out of it.

General consensus is that threads = physical cores is best, so that would be 6 in your case. For some reasons higher than physical cores increases cpu usage but not better performance. If you use a lot of gpu offloading some people are reporting better times with only one cpu thread, so you may want to experiment and compare. No need to touch blasthreads. Blasbatch is fine at 512, you can experiment with it but consensus is 256 and 512 give best speed. Personally I wouldn't touch smartcontext unless you are working with really bad generation times, because while it does speed things up but it cuts your contextmax in half and frontends like Sillytavern that do their magic with context formatting stop functioning well because of it.

gustrd commented 1 year ago

I had good results with blasbatchsize at 1024. You should experiment.

changtimwu commented 1 year ago

Suprisizinly to see an LLM implementation could make use of Intel UHD. I believe you would see more significant performance gain on recent versions of Intel integral graphics (aka Iris Xe, which is available in ADL-p or RPL-p cores).

h3ndrik commented 1 year ago

With my Intel Skylake Xeon, using clblast on the iGPU makes it way slower. Just double-check what you're doing.