Closed m-arbaro closed 1 week ago
Which model and GPUs were you using? Do you get correct results with -b 512 -ub 512
? Do you get correct results when compiling with GGML_CUDA_FORCE_CUBLAS
?
Hello Johannes, Thank you for your guidance. I use Tesla P40.
Do you get correct results with -b 512 -ub 512 No Do you get correct results when compiling with GGML_CUDA_FORCE_CUBLAS? Yes, it works fine with this option. Thank you.
Which model are you using?
Both llama3 instruct and 3.1 instruct (on the latest builds, that are support her), Q8_0 quantizaion.
I can confirm this observation. Meta-Llama-3.1-70B-Instruct-IQ2_M works fine without "row_split", but when using "row_split" it only produces gibberish (in my case the output is only a string of "////////////,////,///" etc. continuing.
Model source: https://huggingface.co/lmstudio-community/Meta-Llama-3.1-70B-Instruct-GGUF/tree/main
System: Dual RTX 3090 setup, Windows, https://github.com/oobabooga/text-generation-webui, v.1.13
Setting screenshot below.
Does this issue still occur on the latest master commit?
Does this issue still occur on the latest master commit?
Yes - I have 3xP40 and 1x4060TI. Output using -rowsplit is a single repeating word. Removing rowsplit works fine (albeit slower). For smaller models that fit entirely on the P40s -rowsplit works fine. Given that and lack of additional complaints/bug reports I'm curious if something broke with rowsplitting across non-homogonous nvidia archs.
same issue, 3 RTX2080Ti, Mistral-Large-Instruct-2407.i1-Q2_K.gguf, b3678 build with GGML_CUDA_FORCE_CUBLAS=false breaks -sm row and GGML_CUDA_FORCE_CUBLAS=true fixes it
Please confirm whether or not this fix works: https://github.com/ggerganov/llama.cpp/pull/9413
What happened?
Since commit b3188 llama-cli produce incoherent output on multi-gpu system with CUDA and row tensor splitting. Layer tensor split works fine but is actually almost twice slower. GPU are 3x Nvidia Tesla + 3090 All future commits seems to be affected.
Name and Version
llama-cli version b3188 built on Debian 12.
What operating system are you seeing the problem on?
Linux
Relevant log output
No response