Open dempti opened 3 years ago
Works for me. Here are the steps:
k8s.amazonaws.com/accelerator: vgpu
k8s.io/cluster-autoscaler/node-template/label/k8s.amazonaws.com/accelerator: vgpu
k8s.io/cluster-autoscaler/node-template/resources/k8s.amazonaws.com/vgpu: "2"
--install-nvidia-plugin=false
The newly created nodes will be properly labeled for the vgpu plugin and the autosdcaler will know that this node group can provide the necessary resources when a pod requests them
Source (under Scaling from zero): https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html#ca-view-logs
@alexpirogovski i believe the nvidia plugin isn't installed by default and we need to separately install it as a daemonset in that case is step 4 necessary , i was not able to get it to work with the 3 other additions you suggested
@admiral-srinjoy you can follow this issue for solution. https://github.com/kubernetes/autoscaler/issues/4315
@alexpirogovski i believe the nvidia plugin isn't installed by default and we need to separately install it as a daemonset in that case is step 4 necessary , i was not able to get it to work with the 3 other additions you suggested
@admiral-srinjoy AFAIR nvidia plugin and aws-virtual-gpu-device-plugin are mutually exclusive
Thanks @dempti this helps
GPU sharing works perfectly fine, but when trying to scale pods based on gpu share, cluster-autoscaler is unable to scale instances based on requirement with following errors.