Bug: Segmentation fault re-running after installing NVIDIA CUDA.

Contact Details

No response

What happened?

I ran llamafile without any flags, it loaded the web UI with CPU inference. I added the GPU flags (--gpu nvidia -ngl 999) and got a segfault. I installed CUDA (then verified with a "Hello, world!"), re-ran the same command and got the same segfault.

I removed ~/.llamafile and reran with the GPU flags and everything worked.

PS: I looked through the issue tracker and couldn't find a duplicate, but maybe this problem is mentioned in a comment thread.

Version

$ ./llava-v1.5-7b-q4.llamafile --version llamafile v0.8.11

What operating system are you seeing the problem on?

Linux

Relevant log output

$ ./llava-v1.5-7b-q4.llamafile --gpu nvidia -ngl 999
import_cuda_impl: initializing gpu module...
get_nvcc_path: note: nvcc not found on $PATH
get_nvcc_path: note: $CUDA_PATH/bin/nvcc does not exist
get_nvcc_path: note: /opt/cuda/bin/nvcc does not exist
get_nvcc_path: note: /usr/local/cuda/bin/nvcc does not exist
link_cuda_dso: note: dynamically linking /home/lab/.llamafile/v/0.8.11/ggml-cuda.so
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support loaded
...
./llava-v1.5-7b-q4.llamafile -m llava-v1.5-7b-Q4_K.gguf --mmproj llava-v1.5-7b-mmproj-Q4_0.gguf --gpu nvidia -ngl 999 
Segmentation fault

$ ./llava-v1.5-7b-q4.llamafile --gpu nvidia -ngl 999  # after installing nvcc, before removing ~/.llamafile
import_cuda_impl: initializing gpu module...
link_cuda_dso: note: dynamically linking /home/lab/.llamafile/v/0.8.11/ggml-cuda.so
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support loaded
...
./llava-v1.5-7b-q4.llamafile -m llava-v1.5-7b-Q4_K.gguf --mmproj llava-v1.5-7b-mmproj-Q4_0.gguf --gpu nvidia -ngl 999 
Segmentation fault

Mozilla-Ocho / llamafile