With the introduction of https://github.com/canonical/lxd/pull/13562 , we can pass an NVIDIA GPU through a LXD container using a CDI notation. This approach unify the dGPU and the iGPU passthrough. Now, nvidia-container-cli is still shipped with LXD for traditional dGPU passthrough (using either a DRM card id or a GPU PCIe address), but is being deprecated by NVIDIA and no further development effort will be added to it. nvidia-container-cli needs to be removed. Here are some considerations:
We need to introduce a replacement tool to list the GPU resources of a host: currently, this is done with nvidia-container-cli info --csv and the results are exposed at GET 1.0/resources under the .gpu.cards field. Could we introduce a tool like deviceQuery (see here) that is listing resources as well AND which support dGPU and iGPU resource listing?
If we remove nvidia-container-cli, we no longer need to pass a PCIe address parameter when adding a GPU device since the detection logic is handled by an NVIDIA lib and not LXD: what are the implications in term of API breaking changes for the users? Shall we keep this device parameter and 'resolve' to a CDI identifier? Shall we remove this parameter completely?
With the introduction of https://github.com/canonical/lxd/pull/13562 , we can pass an NVIDIA GPU through a LXD container using a CDI notation. This approach unify the dGPU and the iGPU passthrough. Now,
nvidia-container-cli
is still shipped with LXD for traditional dGPU passthrough (using either a DRM card id or a GPU PCIe address), but is being deprecated by NVIDIA and no further development effort will be added to it.nvidia-container-cli
needs to be removed. Here are some considerations:GPU resources
of a host: currently, this is done withnvidia-container-cli info --csv
and the results are exposed atGET 1.0/resources
under the.gpu.cards
field. Could we introduce a tool likedeviceQuery
(see here) that is listing resources as well AND which support dGPU and iGPU resource listing?nvidia-container-cli
, we no longer need to pass a PCIe address parameter when adding a GPU device since the detection logic is handled by an NVIDIA lib and not LXD: what are the implications in term of API breaking changes for the users? Shall we keep this device parameter and 'resolve' to a CDI identifier? Shall we remove this parameter completely?