Closed ggerganov closed 1 week ago
I've tried to reproduce this using https://huggingface.co/second-state/Meta-Llama-3.1-8B-Instruct-GGUF/blob/f24f6a255a71bd6211dccbc9bb67f83c51b7edab/Meta-Llama-3.1-8B-Instruct-f16.gguf on A100-SXM and I've not been able to - it's running indefinitely for me without any crash. My A100 is 80GB rather than 40GB but I'd be surprised if that difference was important. I'm assuming that model should be close enough to what you are testing with, but if not can you please share (or point to) your gguf file? Also, do you have any other A100 cards to test on, to rule out any issue with that specific card? Might also be worth trying to build with an older architecture, e.g. -DCMAKE_CUDA_ARCHITECTURES="75" (which will be run via PTX JIT compilation), to check whether the issue is related to building with 80.
Thanks, I think I've made some mistake when installing CUDA. Sorry for the noise. Will report back if there is still issue after I reinstall. Closing for now.
What happened?
I am currently running some tests on A100 and
llama.cpp
crashes when CUDA graphs are enabled. Here are repro steps:Build info:
The model is F16 LLaMA 3.1. The command crashes consistently though at different times after the start.
It stops crashing if I add
GGML_CUDA_DISABLE_GRAPHS=1
.@agray3 Do you have ideas what might be the issue? Do you observe the same crash with A100?
Name and Version
version: 3870 (841713e1) built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response