Closed ptailor1193 closed 10 months ago
As noted in the 1.28 launch notes: the 535 series drivers are not compatible with the older chipsets used in the p2
instance family. This change is necessary to support the latest-and-greatest hardware in the p5
instance family. Instances in the p3
and p4
families will not be impacted by this change.
Hiya!
Will you also be backporting the 5.10 Linux Kernel with this?
Can I ask how many EKS versions you're going to go back as well please?
Will you also be backporting the 5.10 Linux Kernel with this?
Yep! The older NVIDIA drivers are the only thing keeping us on 5.4.
Can I ask how many EKS versions you're going to go back as well please?
We intend to make this change in 1.25+.
Awesome, thanks very much! :D
Ah sorry, one more question: is there an ETA/schedule at all for the 1.25 version?
Is the GPU AMI build process planned to be exposed more in this repo with this change or is that not changing?
Hello again! :)
@cartermckinnon do you know when this might be happening at all/otherwise know of an update on this please?
@cartermckinnon I didn't see a eks-ami release on October 10th, wondering if the 1.27 backport is released?
Any timeline for 1.26?
I didn't see a eks-ami release on October 10th
A recent change in the kernel: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9011e49d54dcc7653ebb8a1e05b5badb5ecfa9f9 makes our current combination of NVIDIA and EFA drivers incompatible. We expect to have a path forward shortly; but we have to pause our backports in the meantime.
Is the GPU AMI build process planned to be exposed more in this repo with this change or is that not changing?
Yes, we plan to upstream the NVIDIA-related scripts.
The next AMI release will extend the 535-series NVIDIA driver and CUDA 12 to Kubernetes versions 1.25 and above.
NVIDIA 535 series drivers have now been backported to EKS optimized Accelerated AMIs 1.25+
With Kubernetes version 1.28 or later, the EKS optimized Accelerated AMIs support NVIDIA 535 series or later drivers out of box. We plan to back port these drivers to older Kubernetes versions starting with 1.27 on October 10th, 2023.