Closed kunger97 closed 5 months ago
This is interesting. Running llama.cpp on a Intel(R) Xeon(R) Platinum 8480 (4th Generation) along with ARC-class GPU(s). I'm still researching the best approach to adding a SYCL backend to GGML which may help in this sort of hardware setup.
Clarify what "freezes" means - do you see any CPU / GPU usage?
Try adding -t 8 -tb 8
to the command-line
Sorry, I'm not a native English speaker, and the term 'freezes' might not be accurate. After running the command (I've also tried the suggested -t 8 -tb 8), the program output stops at 'llm_load_tensors: offloaded 41/41 layers to GPU' for a long time (possibly more than 30 minutes). At this point, using the htop tool, it can be observed that two threads (processes) are running, with one process occupying 100% of the CPU. On the GPU side, GPU with ID 0 is using approximately 615MiB of VRAM, but the GPU frequency is 0.
If you remove -i
does the program finish successfully?
I attempted to run the following command ./main -m ~/gguf/Sakura-13B-LNovel-v0.9.0-Q4_K_M.gguf -p "Hello" -ngl 99 -t 8 -tb 8
, but it seems there is no change compared to the previous run. The output still stops after 'llm_load_tensors: offloaded 41/41 layers to GPU.'(i wait for about 20 min) It appears that the model hasn't (completely) loaded into the VRAM.
work with sycl backend.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
load model and output normally
Current Behavior
runing
./main -m ~/Sakura-13B-LNovel-v0.9.0-Q4_K_M.gguf -i --color -p "Hello" -ngl 99 -n 32 -c 2048 -b 512
and then program freezesEnvironment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Linux node-14 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Failure Information (for bugs)
program freezes, not crash
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
./main -m ~/Sakura-13B-LNovel-v0.9.0-Q4_K_M.gguf -i --color -p "Hello" -ngl 99 -n 32 -c 2048 -b 512
llm_load_tensors: offloaded 41/41 layers to GPU
then freezesFailure Logs
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.
Example environment info:
program log
other info
xpu info when program running
I am willing to provide other necessary information, please feel free to ask me