google-coral / libedgetpu

Source code for the userspace level runtime driver for Coral.ai devices.
Apache License 2.0
172 stars 55 forks source link

M.2 TPU device violates PCI specification #48

Open lamw opened 11 months ago

lamw commented 11 months ago

Description

Customers that attempt to passthrough the M.2 TPU to a Virtual Machine using VMware ESXi Hypervisor have found that the Apex driver fails to initialize.

# dmesg
<snip>
[    3.780139] apex 0000:02:03.0: enabling device (0000 -> 0002)
[    3.785860] apex 0000:02:03.0: Page table init timed out
[    3.786103] apex 0000:02:03.0: MSI-X table init timed out

Upon initial investigation from VMware Engineering, the following was concluded:

Unfortunately the device in question violates PCI specification by mapping PBA, MSI-X vector table, and other registers into same 4KB page (PBA is at 0x46068, VT at 0x46800, but there is a bunch of other registers in 0x46XXX range). PCIe spec 6.0, page 1020, has this to say:

<quote>
If a Base Address Register or entry in the Enhanced Allocation capability that maps address space for the MSI-X Table or
MSI-X PBA also maps other usable address space that is not associated with MSI-X structures, locations (e.g., for CSRs)
used in the other address space must not share any naturally aligned 4-KB address range with one where either MSI-X
structure resides. This allows system software where applicable to use different processor attributes for MSI-X structures
and the other address space. (Some processor architectures do not support having different processor attributes
associated with the same naturally aligned 4-KB physical address range.) The MSI-X Table and MSI-X PBA are permitted
to co-reside within a naturally aligned 4-KB address range, though they must not overlap with each other.
</quote>

So having CSR registers in same page as MSI-X VT page violates the spec, and under ESXi CSR registers become unreachable (writes ignored, reads return zeroes). Due to this device driver cannot correctly initialize device.

If firmware can modify device's behavior so that VT/PBA arrays do not share same 4KB page with other registers, device will work with ESXi's passthrough. Or if firmware can hide MSI-X capability from PCI configuration space, that would fix issue as well.

I'm not sure if this has already been reported but if Google/Coral can either fix the behavior of the device to conform to the PCI specification OR hide MSI-X capability, then successful passthrough of the M.2 TPU should function correctly when using ESXi, which is a popular Hypervisor platform for development purpose

Click to expand! ### Issue Type Build/Install ### Operating System Ubuntu ### Coral Device M.2 Accelerator A+E ### Other Devices _No response_ ### Programming Language _No response_ ### Relevant Log Output _No response_
goldserve commented 9 months ago

Yes, please do look into addressing this!

ManuelPerrot commented 9 months ago

Very interested to have this fixed as well. Looks like Xen could have the same issue: https://xcp-ng.org/forum/topic/6304/google-coral-tpu-pcie-passthrough-woes/20

k1n6b0b commented 9 months ago

Adding another vote to fix this here!! There are a ton of threads/requests for this but they're all over.

https://github.com/google-coral/edgetpu/issues/343

https://github.com/google-coral/edgetpu/issues/729

https://github.com/blakeblackshear/frigate/issues/6331

https://github.com/blakeblackshear/frigate/issues/94

https://github.com/blakeblackshear/frigate/issues/305

grembling22 commented 8 months ago

+1 for a fix

c-po commented 8 months ago

+1

tbozik commented 8 months ago

+1 for a fix not only m.2 but mini pcie as well

kentkravitz commented 8 months ago

+1 fix please.

TokugawaHeavyIndustries commented 8 months ago

+1 for fix, commenting to follow. Note this also affects the Mini-PCIe model (as expected)

syncnj commented 8 months ago

+1

kentkravitz commented 8 months ago

Can anyone think of any other possible workarounds for this problem? Seems like ESXi could also use a quirks mode for pci-e cards that need some tweaking.

kuantek commented 8 months ago

+1 for a fix please

Brandon314 commented 7 months ago

+1 for a fix please

gknepper commented 7 months ago

+1 for a fix please

vobelic commented 6 months ago

+1 for the fix

fama-lama commented 6 months ago

+1

zaolin commented 6 months ago

Just try to disable the msi bus for the bridge if possible, echo 1 > /sys/bus/pci/devices/$bridge/msi_bus as a temporary fix. For me it looks like there is a lot of hacky stuff in the kernel driver: https://github.com/google/gasket-driver/blob/09385d485812088e04a98a6e1227bf92663e0b59/src/gasket_interrupt.c#L245

bridge-four commented 5 months ago

+1 vote for fix!

alexsahka commented 5 months ago

+1 vote for fix!

Claudio1L commented 5 months ago

+1 :-(

thefl0yd commented 5 months ago

This is not likely to ever get fixed now with broadcom deprecating free ESXi. Aware this is a TPU issue but the ESXi userbase is just going to keep shrinking at this point.

Sanman96 commented 4 months ago

@thefl0yd I do not believe this is the case. I have a need to deploy the m.2 in multiple enterprise VMware deployments via passthru.

+1 For a fix

SunvidWong commented 1 month ago

+1 vote for fix!