LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.86k stars 342 forks source link

Can't use CLBlast without sudo #423

Closed PedroVNasc closed 1 year ago

PedroVNasc commented 1 year ago

Expected Behavior

I was trying to use CLBlast in koboldcpp after verifying it works with llamacpp.

Current Behavior

It throws ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at ggml-opencl.cpp:968 no matter which plataform or device I choose.

In llamacpp I was able to select my gpu using GGML_OPENCL_PLATFORM=Clover GGML_OPENCL_DEVICE=1 ./main ...

Environment and Context

I'm using a Ubuntu like system (Pop!OS) on a laptop with an AMD APU an a AMD dGPU which isn't too great but it's still better than the CPU.

Failure Information (for bugs)

I messed a bit with the code to find out what was happening and it seems koboldcpp can't find any OpenCL devices.

clinfo -l output: Platform #0: Clover +-- Device #0: ICELAND (iceland, LLVM 15.0.7, DRM 3.52, 6.4.6-76060406-generic) `-- Device #1: AMD Radeon Vega 8 Graphics (raven, LLVM 15.0.7, DRM 3.52, 6.4.6-76060406-generic)

Steps to Reproduce

  1. Own a laptop with an AMD APU
  2. Try to use koboldcpp with CLBlast

Failure Logs

(base) pedrohenrique@pop-os:~/Gitclone/koboldcpp$ python koboldcpp.py --model models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin --useclblast 0 0
***
Welcome to KoboldCpp - Version 1.42.1
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.so
==========
Namespace(model='models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin', model_param='models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=3, blasthreads=3, psutil_set_threads=False, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], stream=False, smartcontext=False, unbantokens=False, bantokens=None, usemirostat=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=[0, 0], usecublas=None, gpulayers=0, tensor_split=None)
==========
Loading model: /home/pedrohenrique/Gitclone/koboldcpp/models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin
[Threads: 3, BlasThreads: 3, SmartContext: False]

---
Identified as LLAMA model: (ver 5)
Attempting to Load...
---
Using automatic RoPE scaling (scale:1.000, base:10000.0)
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
llama.cpp: loading model from /home/pedrohenrique/Gitclone/koboldcpp/models/airoboros-7b-gpt4-1.4.ggmlv3.q4_0.bin
llama_v3_model_load_internal: format     = ggjt v3 (latest)
llama_v3_model_load_internal: n_vocab    = 32000
llama_v3_model_load_internal: n_ctx      = 2048
llama_v3_model_load_internal: n_embd     = 4096
llama_v3_model_load_internal: n_mult     = 256
llama_v3_model_load_internal: n_head     = 32
llama_v3_model_load_internal: n_head_kv  = 32
llama_v3_model_load_internal: n_layer    = 32
llama_v3_model_load_internal: n_rot      = 128
llama_v3_model_load_internal: n_gqa      = 1
llama_v3_model_load_internal: rnorm_eps  = 5.0e-06
llama_v3_model_load_internal: n_ff       = 11008
llama_v3_model_load_internal: freq_base  = 10000.0
llama_v3_model_load_internal: freq_scale = 1
llama_v3_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_v3_model_load_internal: model size = 7B
llama_v3_model_load_internal: ggml ctx size =    0.09 MB
ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at ggml-opencl.cpp:968
You may be out of VRAM. Please check if you have enough.

If it didn't work with llamacpp, I wouldn't even have bothered to open an issue, as it could've been a problem with my laptop, but seeing that it works well with llama, I just don't understand why it doens't with koboldcpp.

LostRuins commented 1 year ago

Did you try --useclblast 0 1 as that seems to be the index of your device?

PedroVNasc commented 1 year ago

Yes, I tried 0 1, 1 0 and 1 1. All of them result in this output.

LostRuins commented 1 year ago

Then I suspect you may have linked with an incorrect or incomplete CLBlast library. Which ones did you install? For Arch Linux: Install cblas openblas and clblast. For Debian: Install libclblast-dev and libopenblas-dev.

PedroVNasc commented 1 year ago

I have installed the Debian ones previously, but they didn't work either. Then I tried to manually compile CLBlast following some instructions in an issue thread of llama cpp.

The latter made llama work but not kobold. Still, I will remove the current CLBlast install and try again with the Debian one just in case I did something wrong.

PedroVNasc commented 1 year ago

I just tried that and also tried the new version you just released yet it didn't work, same error as before.

PedroVNasc commented 1 year ago

It got even stranger.

I compiled and runned only the bit where it doens't find any opencl devices to try and comprehend what is happening. And it returns the correct devices:

Platform:0 Device:0  - Clover with ICELAND (iceland, LLVM 15.0.7, DRM 3.52, 6.4.6-76060406-generic)
Platform:0 Device:1  - Clover with AMD Radeon Vega 8 Graphics (raven, LLVM 15.0.7, DRM 3.52, 6.4.6-76060406-generic)

So it isn't the problem. Then, out of curiosity I decided to use kobold with sudo and it started to work.

For some reason, kobold can't find any opencl devices unless I run it with sudo.

PedroVNasc commented 1 year ago

After using sudo worked, I suspected it had to do with the environment kobold is running. I have anaconda installed so I tried deactivating it to see if kobold would work and it indeed worked.

It seems that python environment may mess with the way opencl works, I'm not sure if you want to "fix" it or if it can or should be fixed but maybe a warning should be added about that.

PedroVNasc commented 1 year ago

So, I tried kobold cpp and although CLBlast is working (processing is indeed faster) if at any point it starts to use BLAS to process, it breaks and outputs gibberish or throws a segmentation fault.

I still didn't try llama to see if the same happens, but I expect it to behave the same way.

LostRuins commented 1 year ago

I don't use linux myself so unfortunately it will be hard for me to troubleshoot. Have you tried with a different model?

PedroVNasc commented 1 year ago

Yes I tried llama, airoboros and pygmalion. I will try to investigate a bit further but just by getting CLBlast working, the processing time got way faster.

PedroVNasc commented 1 year ago

I haven't learned much yet and won't be able to study this issue anytime soon and as I'm currently able to use CLBlast and it is much faster than BLAS anyway, I'll close this issue.

If I find anything later, I'll open another issue or a PR.