Closed hzgdeerHo closed 1 month ago
can you try with --ctx-size 16384
instead of --ctx-size 16128
? (I'm not sure if it fixes the problem or not)
It does not work with --ctx-size 16384 ,but If I set like this : --ctx-size 32000 ,It works, I think it is related about the truncated process is enabled .How could I disabled the truncated process. THANKS !
I'm not sure what you mean by "truncated process".
Keep in mind that the actual context size will be --ctx-size
divided by --parallel
, so for example with 16384
you have 16384 / 4096 = 4096 tokens per slot
, so it's normal to increase ctx size if you set a high value for --parallel
THANKS!
This issue was closed because it has been inactive for 14 days since being marked as stale.
What happened?
CMD which Works Normally:
CUDA_VISIBLE_DEVICES=0 ./llama-server -m /home/ubuntu/.cache/huggingface/hub/models--MaziyarPanahi--Meta-Llama-3.1-8B-Instruct-GGUF/snapshots/1f301d86d760b435a11a56de3863bc0121bfb98f/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf --gpu-layers 33 -cb --ctx-size 16128 --flash-attn --batch-size 512 --chat-template llama3 --port 8866 --host 0.0.0.0
CMD which Works NOT Normally:
CUDA_VISIBLE_DEVICES=0 ./llama-server -m /home/ubuntu/.cache/huggingface/hub/models--MaziyarPanahi--Meta-Llama-3.1-8B-Instruct-GGUF/snapshots/1f301d86d760b435a11a56de3863bc0121bfb98f/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf --gpu-layers 33 -cb --parallel 4 --ctx-size 16128 --flash-attn --batch-size 512 --chat-template llama3 --port 8866 --host 0.0.0.0
ubuntu@VM-0-16-ubuntu:~$ nvidia-smi Thu Aug 8 21:22:25 2024
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-32GB Off | 00000000:00:08.0 Off | 0 | | N/A 34C P0 39W / 300W | 10194MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 35134 C ./llama-server 10192MiB | +---------------------------------------------------------------------------------------+
Name and Version
ubuntu@VM-0-16-ubuntu:~/llama.cpp$ ^C ubuntu@VM-0-16-ubuntu:~/llama.cpp$ ./llama-cli --version version: 3549 (afd27f01) built with cc (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output