Closed visheshtanksale closed 5 months ago
For a GPU configured as passthrough , device plugin does not update the GPU count on the node when a GPU falls off the bus.
To reproduce follow the steps
Remove the GPU from the bus echo "1" > /sys/bus/pci/devices/<gpu_pci_id>/remove
echo "1" > /sys/bus/pci/devices/<gpu_pci_id>/remove
Validated the GPU is no longer visible from the host using lspci lspci -nnk -d 10de:
lspci -nnk -d 10de:
The number of GPUs exposed on k8s node doesn't change.
Watching for iommu groups under /dev/vfio creates a fsnotify when the GPU falls off the bus
/dev/vfio
@rthallisey Please let me know how it looks
Looks fine. Thanks
For a GPU configured as passthrough , device plugin does not update the GPU count on the node when a GPU falls off the bus.
To reproduce follow the steps
Remove the GPU from the bus
echo "1" > /sys/bus/pci/devices/<gpu_pci_id>/remove
Validated the GPU is no longer visible from the host using lspci
lspci -nnk -d 10de:
The number of GPUs exposed on k8s node doesn't change.
Watching for iommu groups under
/dev/vfio
creates a fsnotify when the GPU falls off the bus