dataiku / dss-plugin-eks-clusters

Apache License 2.0
4 stars 6 forks source link

Backport nvidia change loc dss11 #65

Closed thtrunck closed 4 months ago

thtrunck commented 4 months ago

[sc-179642] https://app.shortcut.com/dataiku/story/179642/eks-aks-nvidia-device-plugin-changed-location

thtrunck commented 4 months ago

Be careful in 1.1.2 there is also the cahnge in 1.1.1 (adding beta when kubectl version > 1.23). https://app.shortcut.com/dataiku/story/127914/eks-apiversion-incompatibility-with-kubectl-1-26

I checked that i can create an EKS GPU cluster (and I see the daemonset). Everything works out of the box from DCS image (kubectl 1.23/authenticator 0.5.0). If I upgrade kubectl to 1.24 I have the expected error but upgrading aws-iam-authenticator to 0.5.9 works.

We do lots of version comparaison using string so it's pretty bad (eksctl is more recent than 0.32.0 so we should use gpu setting from eksctl but we don't cause "0.145" < "0.32") and we have the same issue with latest aws-iam-authenticator. But out of the box version works and we aren't making thing worst so that's ok for the hotfix to unlock gpu on DSS11.