I've noticed that the GPU utilization is very low during model inference, with a maximum of only 80%, but I want to increase the GPU utilization to 99%. How can I adjust the parameters?
GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... Off | 00000000:8A:00.0 Off | 0 |
| N/A 66C P0 205W / 250W | 14807MiB / 40960MiB | 78% Default
I've noticed that the GPU utilization is very low during model inference, with a maximum of only 80%, but I want to increase the GPU utilization to 99%. How can I adjust the parameters? GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... Off | 00000000:8A:00.0 Off | 0 | | N/A 66C P0 205W / 250W | 14807MiB / 40960MiB | 78% Default
Originally posted by @xiangxinhello in https://github.com/abetlen/llama-cpp-python/issues/1669#issuecomment-2277577719