Open sysuls1 opened 4 months ago
cool
How should I address this issue to be able to utilize flash_attn properly?
How is no one addressing this? LMStudio doesn't work for me at all anymore. It's bricked.
Feel free to work on it if you need it. We welcome contributions.
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
CUDA_VISIBLE_DEVICES=0 ./llama-server --host 0.0.0.0 --port 8008 -m /home/kemove/model/gemma-2-27b-it-Q5_K_S.gguf -ngl 99 -t 4 -np 4 -ns 4 -c 512 -fa