Open Krast76 opened 2 months ago
@cdesiniotis Do we need this feature? If so, I can contribute.
@Krast76 when using the gpu-operator, the driver is installed at /run/nvidia/driver
on the host. So you can change the "power supply" limit by running sudo chroot /run/nvidia/driver nvidia-smi -pl ${POWER_LIMIT}
on the host or exec'ing into the driver daemonset pod and running nvidia-smi -pl ${POWER_LIMIT}
. Does that help?
Can we provide a command that can be executed by the user in the driver daemonset after the driver install succeeds to make it more universal?
@Krast76 when using the gpu-operator, the driver is installed at
/run/nvidia/driver
on the host. So you can change the "power supply" limit by runningsudo chroot /run/nvidia/driver nvidia-smi -pl ${POWER_LIMIT}
on the host or exec'ing into the driver daemonset pod and runningnvidia-smi -pl ${POWER_LIMIT}
. Does that help?
This is what I did with static nodes. Since I have autoscaling nodes I can't set the power limit by hands. To do so, I made a quick and "dirty" daemonset to handle that case : https://github.com/Krast76/k8s-nvidia-power-limiter. I use it as a DaemonSet since september and it works like a charm.
Currently, like I said, it's quick and dirty code, if I find the time I'll add a documentation and an example of how to run it and perharps refactor the code (better logging etc)
1. Issue or feature description
Cause I have a "power supply" limits I must set power limits on GPU, before Kubernetes i'm used to do it with nvidia-smi :
In a Kubernetes environnement where the nodes are created and destroyed many time per day I would like to see that managed by the gpu-operator.
I took a look at kernel documentation but I have found nothing to manage that through kernel parameters.
Thanks