I use llama-cpp-python on a non-GPU system and on a AMD GPU 6650 on Linux (POP OS 22.04).This report is for the AMD GPU system. The non-GPU system outputs results fine. The AMD GPU system using the same code but offloading to the AMD 6650M GPU produces garbage output--either entirely or in part. The garbage includes excessive numbers of \t (tab), newline (\n) or # characters.
The AMD enabled GPU output should produce valid results without garbage.
EDITED NOTE: I just saw 2.32 is available 2024-01-23. I tried the below with 2.32 and same results.
$ python3 --version
Python 3.10.12
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
$ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Failure Information (for bugs)
See sample above. Output varies and usually includes \t, \n, or # characters. (Hundreds in some cases.)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DLLAMA_CLBLAST=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=gfx1030" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.31 --upgrade --force-reinstall --no-cache-dir` NOTE: I tried various version back to 2.25. All do this.
Set n_gpu_layers in my sample code. E.g., -1, 0, 30
Execute my sample code.
Look at results. E.g., setting n_gpu_layers to -1 with a Mistral model (GGUF) results --NOTE THE finish_reason as LENGTH
5.'text': '\t\tExpected Output:\n \t\tAaron v. Brown, 123 PA 78,91 (Pa. 1970)\n \t\tDavid v. Frank, 456 PA 99,101 (Pa. 1971)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], .
Expected Behavior
I use llama-cpp-python on a non-GPU system and on a AMD GPU 6650 on Linux (POP OS 22.04).This report is for the AMD GPU system. The non-GPU system outputs results fine. The AMD GPU system using the same code but offloading to the AMD 6650M GPU produces garbage output--either entirely or in part. The garbage includes excessive numbers of \t (tab), newline (\n) or # characters.
The AMD enabled GPU output should produce valid results without garbage.
EDITED NOTE: I just saw 2.32 is available 2024-01-23. I tried the below with 2.32 and same results.
Current Behavior
I compile llama-cpp-python using:
CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DLLAMA_CLBLAST=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=gfx1030" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.31 --upgrade --force-reinstall --no-cache-dir
I then run a simple script to test on my system. The script in part is:
The result varies depending on GGUF model. Using
mistral-7b-v0.1.Q4_K_M.gguf
I get:Environment and Context
$ uname -a
Linux truly-omen 6.6.6-76060606-generic #202312111032~1702306143~22.04~d28ffec SMP PREEMPT_DYNAMIC Mon D x86_64 x86_64 x86_64 GNU/Linux
Failure Information (for bugs)
See sample above. Output varies and usually includes \t, \n, or # characters. (Hundreds in some cases.)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
'text': '\t\tExpected Output:\n \t\tAaron v. Brown, 123 PA 78,91 (Pa. 1970)\n \t\tDavid v. Frank, 456 PA 99,101 (Pa. 1971)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], .