On the same Fedora 41 machine with AMD 7800XT, the llamafile was able to leverage GPU acceleration with one user, but fall backing to CPU inference if switching to another user. The llamafile engine, script and weight remain identical between those two users.
The script I used to launch llamafile is fairly straigforward:
#!/bin/bash
GGUF_FOLDER=/mnt/LINDATA/LLM/GGUF
if ! [ -d $GGUF_FOLDER ] ; then
echo "GGUF folder does not exist"
exit 1
else
cd $GGUF_FOLDER
HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/local/bin/llamafile -m qwen2.5-14b-instruct-q5_k_m.gguf --server --nobrowser --log-disable -ngl 999 --nocompile
fi
However, calling the same script from user1 can activate GPU while user2 always fallback to CPU. I had tried to delete the .llamafile from home directory of user2. But it doesn't appear to fix anything.
I have always fully log-out from user1 and log-in to user2 to avoid potential device lock. And both users are members of video group.
I am a bit clueless on which part of configuration difference between 2 users could cause this.
Appreciate any help!
Version
llamafile 0.8.15
What operating system are you seeing the problem on?
What happened?
On the same Fedora 41 machine with AMD 7800XT, the llamafile was able to leverage GPU acceleration with one user, but fall backing to CPU inference if switching to another user. The llamafile engine, script and weight remain identical between those two users.
The script I used to launch llamafile is fairly straigforward:
However, calling the same script from user1 can activate GPU while user2 always fallback to CPU. I had tried to delete the
.llamafile
from home directory of user2. But it doesn't appear to fix anything.I have always fully log-out from user1 and log-in to user2 to avoid potential device lock. And both users are members of
video
group.I am a bit clueless on which part of configuration difference between 2 users could cause this. Appreciate any help!
Version
llamafile 0.8.15
What operating system are you seeing the problem on?
Linux
Relevant log output