Closed amshinde closed 6 years ago
There does not seem to be straight forward way to do this, as the device node and sysfs tree for devices is different for different devices. We would likely need to define a separate method for each kind of device.
For eg, this is how the audio and graphics cards look on a machine that I was testing:
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04)
For the graphics device, the device nodes appear as :
$ls -la /dev/dri/
total 0
crw-rw----+ 1 root video 226, 0 Nov 5 22:27 card0
crw-rw----+ 1 root video 226, 128 Nov 5 22:27 renderD128
Navigating sysfs based on major and minor number gives:
$ls -la /sys/dev/char/226\:0/device
lrwxrwxrwx 1 root root 0 Nov 6 11:23 /sys/dev/char/226:0/device -> ../../../0000:00:02.0
$ ls -la /sys/dev/char/226\:128/device
lrwxrwxrwx 1 root root 0 Nov 5 22:27 /sys/dev/char/226:128/device -> ../../../0000:00:02.0
For the audio device, things were a bit different
ls -la /dev/snd
total 0
drwxr-xr-x 2 root root 80 Nov 6 23:25 by-path
crw-rw----+ 1 root audio 116, 7 Nov 6 23:25 controlC0
crw-rw----+ 1 root audio 116, 2 Nov 5 22:27 controlC1
crw-rw----+ 1 root audio 116, 11 Nov 6 23:25 hwC0D0
crw-rw----+ 1 root audio 116, 6 Nov 5 22:27 hwC1D2
crw-rw----+ 1 root audio 116, 8 Nov 6 23:25 pcmC0D3p
crw-rw----+ 1 root audio 116, 9 Nov 6 23:25 pcmC0D7p
crw-rw----+ 1 root audio 116, 10 Nov 6 23:25 pcmC0D8p
crw-rw----+ 1 root audio 116, 4 Nov 5 22:27 pcmC1D0c
crw-rw----+ 1 root audio 116, 3 Nov 5 22:27 pcmC1D0p
crw-rw----+ 1 root audio 116, 5 Nov 5 22:27 pcmC1D2c
crw-rw----+ 1 root audio 116, 1 Nov 5 22:27 seq
crw-rw----+ 1 root audio 116, 33 Nov 5 22:27 timer
In this case, the device symlink under sysfs does not give the pci device
$ ls -la /sys/dev/char/116\:6/device
lrwxrwxrwx 1 root root 0 Nov 7 05:54 /sys/dev/char/116:6/device -> ../../card1
Essentially one more traversal yielded the pci device information:
$ readlink /sys/dev/char/116\:6/device/device
../../../0000:00:1b.0
In summary, the structure of the device nodes varies from device to device.
However instead of relying on the device symlinks under /sys/dev/char/$major-$minor/, I relaized that the path itself points to pci information:
ls -la /sys/dev/char/
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:1 -> ../../devices/virtual/sound/seq
lrwxrwxrwx 1 root root 0 Nov 7 05:45 116:10 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/pcmC0D8p
lrwxrwxrwx 1 root root 0 Nov 7 05:45 116:11 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/hwC0D0
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:2 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/controlC1
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:3 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/pcmC1D0p
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:33 -> ../../devices/virtual/sound/timer
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:4 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/pcmC1D0c
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:5 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/pcmC1D2c
lrwxrwxrwx 1 root root 0 Nov 6 11:50 116:6 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/hwC1D2
lrwxrwxrwx 1 root root 0 Nov 7 05:45 116:7 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/controlC0
lrwxrwxrwx 1 root root 0 Nov 7 05:45 116:8 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/pcmC0D3p
lrwxrwxrwx 1 root root 0 Nov 7 05:45 116:9 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/pcmC0D7p
graphics:
lrwxrwxrwx 1 root root 0 Nov 6 11:50 226:0 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0
lrwxrwxrwx 1 root root 0 Nov 6 11:50 226:128 -> ../../devices/pci0000:00/0000:00:02.0/drm/renderD128
We can consider looking at the pci information in the above symlinks to decide if a device is pci and can be passed through VFIO.
@amshinde @mcastelino -- in the initial comment for this, you talk about needing to verify its a PCI device and in its own iommu group. Do we really need to do this checking? Existing hypervisor should fail if it doesn't, and we could just make sure that the errors are propogated back appropriately?
In general, is there more action required on this issue wrt Kata?
After some discussions with @amshinde, we need to find a way to identify the device inside the VM. We need one identifier which will help us so that we can reliably find the device we've been passing through. This way, we can make it show up as the expected path /dev/mydev
for the user using --device /dev/vfio/16:/dev/mydev
.
This issue was moved to kata-containers/runtime#155
Refer https://github.com/clearcontainers/runtime/issues/821 We currently have support for passing vfio device groups with --device. The user is expected to perform the bind to vfio-pci for this. We need to add support for the runtime to check if a device passed is a pci device and then pass it the container VM using pci passthrough/vfio. We need to make sure that the device is the only device in its iommo group. The runtime would then unbind the device from its current kernel driver and assign the device to vfio-pci, passing it to the VM with pci-passthrough. When the container exits, the runtime would then need to bind the device back to its host driver.