Thanks so much for the server, it's a great compact fast thing.
I noticed that it does not switch video cards to P0 max performance mode and does not switch back to P8. This is a bit suboptimal for server work.
Could you use the python command as described in the link for inference ?
Llama.cpp supports it (with patch) and after inference it is necessary to switch manually to work with TTS.
Thanks so much for the server, it's a great compact fast thing. I noticed that it does not switch video cards to P0 max performance mode and does not switch back to P8. This is a bit suboptimal for server work. Could you use the python command as described in the link for inference ? Llama.cpp supports it (with patch) and after inference it is necessary to switch manually to work with TTS.
as described https://github.com/sasha0552/nvidia-pstate