LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.83k stars 342 forks source link

AMD GPU VRAM idling at 0mhz tanking performance on Vulkan and ROCm #978

Open YW5555 opened 2 months ago

YW5555 commented 2 months ago

My AMD 6900XT VRAM would idle to 0mhz after a while and this would tank the performance of koboldcpp. The idling VRAM doesn't affect other workloads like gaming because the VRAM frequency would pop back to 2000mhz. But generating with koboldcpp fails to 'wake up' the VRAM, tanking my generation speed to ~7 tk/s instead of the usual ~50tk/s. My solution is to turn off Freesync in the AMD driver which forces the VRAM frequency to stay at the max 2000mhz clock rate all the time.

This issue has been happening in the past few months with different GGUF, Windows update, AMD driver and koboldcpp versions (main with Vulkan and ROCm fork) combinations. All GGUF layers and context are completely offloaded to VRAM (~11GB).

Current hardware: OS: Windows 11 Pro 23H2 GPU: Powercolor AMD 6900XT Reference GPU Driver: 24.5.1 CPU: AMD 5800X

LostRuins commented 2 months ago

Are you sure it's not a threading issue instead? If you keep koboldcpp in the foreground with --foreground does it still happen? I'm not sure GPU driver power throttling behavior can be controlled from the program, could be due to your power options e.g. battery mode. Perhaps you need to config it in the AMD control panel.