MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.2k stars 21.35k forks source link

Clarification for NVIDIA GRID / vGPU version needed #121599

Closed es94129 closed 2 months ago

es94129 commented 5 months ago

The doc page has a newly added note, which is unclear:

For Azure NVads A10 v5 VMs we recommend customers to always be on the latest driver version. The latest NVIDIA major driver branch(n) is only backward compatbile with the previous major branch(n-1). For eg, vGPU 17.x is backward compatible with vGPU 16.x only. Any VMs still runnig n-2 or lower may see driver failures when the latest drive branch is rolled out to Azure hosts.

  1. The note is located under this section (https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#nvidia-grid-drivers), is vGPU part of the GRID driver that is provided by Microsoft in the link (https://github.com/Azure/azhpc-extensions/blob/master/NvidiaGPU/resources.json)? Or is it something that is pre-installed in Azure NVads A10 v5 VMs?
  2. If vGPU is a software that we install via downloading the GRID driver, then how would Azure supporting new vGPU versions affect existing VMs that are running older vGPU versions (e.g., 15.x)?

Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

vikancha-MSFT commented 5 months ago

@es94129 vGPU is the new branding from NVIDIA for the GRID driver. Customers need to install the driver on the VMs. This is not pre-installed. Azure will be updating the platform to support the vGPU 17.0 drivers from NVIDIA. As such the backward compatibility will be maintained to v16.x guest drivers only for both Windows and Linux users. v15.x drivers may work but are not supported after the platform update. We recommend users to move to v16.x guest drivers to avoid disruptions to the working environment post update

vikancha-MSFT commented 5 months ago

@es94129 Also, we just sent out an update to all customers asking to update the driver to 16.x by end of May. We will discuss with NVIDIA on how to increase the support matrix for futue releases.

es94129 commented 5 months ago

Thanks @vikancha-MSFT for the context! For more clarity, is the coming Azure platform update some (a) hardware / software changes for all NV v5 VMs, or (b) just that a new vGPU version would be released for customers to download?

mattmcinnes commented 2 months ago

Hi @es94129 , thanks for your feedback!

This issue is being resolved with Nvidia as some of their Grid drivers are not public without a requisite subscription. I've reached out to the team handling vGPU and they should update it soon!

please-close