Open laserprec opened 2 years ago
Hi laserprec, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.
I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!
Triage required from @Azure/aks-pm
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
@immuzz, @justindavies would you be able to assist?
Author: | laserprec |
---|---|
Assignees: | - |
Labels: | `feature-request`, `triage`, `windows`, `action-required`, `Needs Attention :wave:` |
Milestone: | - |
I have a customer who's also interested in this feature, are there any updates in this? They are doing cloud 3D rendering.
I have a guide on how you can manually configure GPU acceleration for Windows AKS nodes at https://github.com/marosset/aks-windows-gpu-acceleration
This does require installing the nvidia driver extension against the VMSS which backs the AKS Windows node pool and is not an ideal solution. It would be great if AKS could configure Windows nodes with the appropriate drivers!
@allyford I'd like to highlight a blocker that will need to be resolved in order to provide proper bin packing and scaling functionality for GPU accelerated Windows workloads on AKS. As discussed in https://github.com/microsoft/Windows-Containers/issues/333, the DirectX Graphics Kernel is not currently Silo-aware, and will expose all GPUs that are present on the host system to any container that requests a GPU. This results in incorrect behaviour when attempting to allocate individual GPUs to containers on Kubernetes worker nodes that have more than one GPU, and currently limits practical use to single-GPU VM types for worker nodes.
@fady-azmy-msft was previously handling the issue for tracking this blocker, and has directed me to continue the discussion here with the AKS team in this thread instead. All of the relevant technical details are available in both the Windows Containers issue thread and the blog post Bringing full GPU support to Windows containers in Kubernetes, the latter of which also discusses the broader implications for deploying and scaling Windows GPU workloads on Kubernetes. Please let me know if there's any additional information that I can provide, or anything that I can do to help.
@adamrehn Will this also be a problem with a single GPU machine in a scenario where that GPU is used by multiple pods?
It won't really affect that scenario, since in that case you're either only exposing one GPU to multi pods already, or you were using an external system to only assign the GPU to one pod, and other pods don't get any GPU, so shouldn't be activating the GPU acceleration for the other pods, and they won't get access to the GPU.
+1 to what Paul said above. Everything works as expected for worker nodes with a single GPU, including exposing that GPU to multiple containers (e.g. when enabling the multitenancy option of the Kubernetes Device Plugins for DirectX).
AKS has just released Windows GPU on AKS in public preview. Please take a look at our documentation to test it out! If you have any feedback, please let us know.
We will be adding driver type selection for Windows GPU usage. This means that you'll be able to specify GRID or CUDA. Preview release expected in Sept 2024.
GPU Driver type selection on track for Sept preview. See #4505 for updates
Feature ETA: This feature is currently in Public Preview. GA is planned for early 2025.
We have dedicated GPU workload that can only be ran on Windows. It would be great if we can leverage AKS for such load. We are aware of the the GPU-enabled Linux nodes, so we are curious if the support for GPU-enabled Windows nodes is in the feature roadmap, and if so, we would love to know if you have estimated timeline for its availability.
Thank you!