llama : NvAPI performance state change support

Related: #8084

TODO:

[x] Implement performance state switching functions
[ ] Place performance state switching calls in a common function before/after inference start/end
- [ ] Switch only if Pascal GPU(s) present
[x] Compile only if ~~CUDA~~ enabled
- [ ] Enable by default if CUDA enabled, otherwise disable
[ ] Log performance state changes and library loading status
[ ] Synchronize pstate changes between n instances of llama.cpp on a single GPU
[ ] Clean up temporary/debug code

ggerganov / llama.cpp