google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
429 stars 125 forks source link

M.2 PCIe Accelerator A+E key Failed to load delegate from libedgetpu.so.1 #870

Open PinxuanHuang opened 1 month ago

PinxuanHuang commented 1 month ago

Description

I've encountered this issue for a long time. I mount my device /dev/apex_0 with --device and I grant my docker container --privileged and also close the --security-opt (selinux, apparmor) but still can not bypass this error. Does anyone has encountered the same issue? Did I miss any settings or misconfiguration?

Thanks.

docker-ce 24.0.1

linux 4.19.294

Click to expand! ### Issue Type Bug ### Operating System Linux ### Coral Device M.2 Accelerator A+E ### Other Devices _No response_ ### Programming Language Python 3.7 ### Relevant Log Output ```shell I tflite/edgetpu_manager_direct.cc:453] No matching device is already opened for shared ownership. I driver/usb/local_usb_device.cc:944] EnumerateDevices: vendor:0x1a6e, product:0x89a I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[2] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[2] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[0] I driver/usb/local_usb_device.cc:944] EnumerateDevices: vendor:0x18d1, product:0x9302 I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[2] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[2] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[0] I tflite/edgetpu_context_direct.cc:106] USB always DFU: False (default) I tflite/edgetpu_context_direct.cc:128] USB bulk-in queue capacity: default I tflite/edgetpu_context_direct.cc:67] Performance expectation: Max (default) I ./driver/mmio/host_queue.h:266] Starting in normal mode I driver/kernel/kernel_registers.cc:83] Opening /dev/apex_0. read_only=0 I driver/kernel/kernel_registers.cc:97] mmap_offset=0x0000000000040000, mmap_size=4096 I driver/kernel/kernel_registers.cc:108] Got map addr at 0x0x7f807ef000 I driver/kernel/kernel_registers.cc:97] mmap_offset=0x0000000000044000, mmap_size=4096 I driver/kernel/kernel_registers.cc:108] Got map addr at 0x0x7f807ee000 I driver/kernel/kernel_registers.cc:97] mmap_offset=0x0000000000048000, mmap_size=4096 I driver/kernel/kernel_registers.cc:108] Got map addr at 0x0x7f807e7000 W driver/beagle/beagle_kernel_top_level_handler.cc:131] Could not set performance expectation : 4 (Inappropriate ioctl for device) I driver/kernel/kernel_registers.cc:211] Read: offset = 0x00000000000486f0, value: = 0x0000000000000000 I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000487a8, value = 0x0000000000000000 I driver/kernel/kernel_registers.cc:122] Closing /dev/apex_0. mmap_offset=0x0000000000040000, mmap_size=4096, read_only=0 I driver/kernel/kernel_registers.cc:122] Closing /dev/apex_0. mmap_offset=0x0000000000044000, mmap_size=4096, read_only=0 I driver/kernel/kernel_registers.cc:122] Closing /dev/apex_0. mmap_offset=0x0000000000048000, mmap_size=4096, read_only=0 I tflite/edgetpu_context_direct.cc:401] Failed to open device [Apex (PCIe)] at [/dev/apex_0]: Failed precondition: Could not partition page table. : 5 (Operation not permitted) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate delegate = Delegate(library, options) File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__ raise ValueError(capture.message) ValueError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "examples/classify_image.py", line 123, in main() File "examples/classify_image.py", line 73, in main interpreter = make_interpreter(*args.model.split('@')) File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 87, in make_interpreter delegates = [load_edgetpu_delegate({'device': device} if device else {})] File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 52, in load_edgetpu_delegate return tflite.load_delegate(_EDGETPU_SHARED_LIB, options or {}) File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 163, in load_delegate library, str(e))) ValueError: Failed to load delegate from libedgetpu.so.1 # ===== lspci ===== 0000:01:00.0 Class 0880: Device 1ac1:089a (prog-if ff) Subsystem: Device 1ac1:089a Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [108 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [110 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Capabilities: [200 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr- PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr- PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr- PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr+ HeaderOF- AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Kernel driver in use: apex # ========== # ====== modinfo ====== filename: kernel/drivers/staging/gasket/apex.ko license: GPL v2 author: John Joseph description: Google Apex driver version: 1.0 alias: pci:v00001AC1d0000089Asv*sd*bc*sc*i* srcversion: B67ABF7B8580FA3770BF46B depends: gasket vermagic: 4.19.294 SMP preempt mod_unload aarch64 filename: kernel/drivers/staging/gasket/gasket.ko license: GPL v2 author: Rob Springer description: Google Gasket driver framework version: 1.1.2 srcversion: A57AFDFDB6FC971A2E9917C depends: vermagic: 4.19.294 SMP preempt mod_unload aarch64 #============= linux 4.19.294 ```