Closed BritishTeapot closed 4 months ago
I'm not sure but at first glance your GPU has only 2gb VRAM which isn't really enough. I don't think you'll get much benefit from offloading even if it worked.
I can't say for certain if this information will be of use, but on my older laptop equipped with an Nvidia MX graphics card boasting 2GB of VRAM, I managed to leverage ClBlast effectively by offloading certain layers (though I've since transitioned to CUDA, which has proven to be a superior option).
It's possible that the challenges you're encountering are specific to AMD hardware, but it might be worthwhile to investigate this upstream (llama.cpp).
By the way, if you can already use it to speed up prompt processing, I believe the gain of offloading layers will be marginal.
Just pulled the latest changes and recompiled - the issue is gone. Also, it actually did make it noticeably faster, but only for context processing, gen speed actually got slower :D. Closing the issue.
Koboldcpp 1.64 (concedo),
Hardware:
iMac 21.5 2017: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz AMD Radeon Pro 555 Compute Engine (2GB VRAM) RAM 32GB
Issue:
Using --useclblast and --gpulayers together always results in "not enought space in the buffer" error. Persists with different models and any layer count.
Example output: