google-coral / pycoral

Python API for ML inferencing and transfer-learning on Coral devices
https://coral.ai
Apache License 2.0
360 stars 149 forks source link

Production project - Unable to get past No module named 'pycoral.utils' #110

Closed stanleyoz closed 1 year ago

stanleyoz commented 1 year ago

Description

We have tested working industrial safety monitoring solution on a RPi and now need to port our TFlite model and inference solution to an industrial IMX8 gateway. By chasing the various 'hints' on Stack Overflow etc, we are now stuck at a Pycoral library error ...

(py38_venv) root@nodeG5:/tf# python predict_tpu.py Traceback (most recent call last): File "predict_tpu.py", line 4, in from pycoral.utils import edgetpu ModuleNotFoundError: No module named 'pycoral.utils'

Python 3.8.12 GCC 10.2.1 Debian 11 on I.MX8

We have also tried ... sudo apt-get update sudo apt-get install python3-pycoral

This is a potential large scale installation (if we can order enough EDGE TPUs! hhh) but we are really short of time to get this error ironed out and install a few units onsite (Lithium mine site). Please advise team TPU.

Click to expand! ### Issue Type Build/Install ### Operating System Linux ### Coral Device _No response_ ### Other Devices _No response_ ### Programming Language Python 3.8 ### Relevant Log Output _No response_
hjonnala commented 1 year ago

Could you please share the logs for the below commands:

stanleyoz commented 1 year ago

thanks super 911 reponse mate ....

(py38_venv) root@nodeG5:~/py38# python3 -m pip install --extra-index-url https://google-coral.github.io/py-repo/ pycoral~=2.0 Looking in indexes: https://pypi.org/simple, https://google-coral.github.io/py-repo/ Collecting pycoral~=2.0 Downloading https://github.com/google-coral/pycoral/releases/download/v2.0.0/pycoral-2.0.0-cp38-cp38-linux_aarch64.whl (352 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 352.7/352.7 kB 1.9 MB/s eta 0:00:00 Collecting Pillow>=4.0.0 (from pycoral~=2.0) Downloading Pillow-9.5.0-cp38-cp38-manylinux_2_28_aarch64.whl (3.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 1.7 MB/s eta 0:00:00 Requirement already satisfied: numpy>=1.16.0 in ./py38_venv/lib/python3.8/site-packages (from pycoral~=2.0) (1.24.3) Collecting tflite-runtime==2.5.0.post1 (from pycoral~=2.0) Downloading https://github.com/google-coral/pycoral/releases/download/v2.0.0/tflite_runtime-2.5.0.post1-cp38-cp38-linux_aarch64.whl (1.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 2.3 MB/s eta 0:00:00 Installing collected packages: tflite-runtime, Pillow, pycoral Attempting uninstall: pycoral Found existing installation: pycoral 0.1.0 Uninstalling pycoral-0.1.0: Successfully uninstalled pycoral-0.1.0 Successfully installed Pillow-9.5.0 pycoral-2.0.0 tflite-runtime-2.5.0.post1 (py38_venv) root@nodeG5:~/py38# python3 -c "import tflite_runtime as tflite; print('tflite runtime vesrion:', tflite.version);import pycoral; print('pycoral version:', pycoral.version)" Traceback (most recent call last): File "", line 1, in AttributeError: module 'tflite_runtime' has no attribute 'version'

hjonnala commented 1 year ago

Installing collected packages: tflite-runtime, Pillow, pycoral Attempting uninstall: pycoral Found existing installation: pycoral 0.1.0 Uninstalling pycoral-0.1.0: Successfully uninstalled pycoral-0.1.0 Successfully installed Pillow-9.5.0 pycoral-2.0.0 tflite-runtime-2.5.0.post1

Now, you have the correct versions of pycoral and tflite runtime.. Does your script working fine now?

python3.10 -c "import tflite_runtime as tflite; print('tflite runtime vesrion:', tflite.version);import pycoral; print('pycoral version:', pycoral.version)"

stanleyoz commented 1 year ago

Wow! We gone past the pycoral.utils hurdle BUT, we now have

ValueError: Failed to load delegate from libedgetpu.so.1

We did the "Setup Device" part and

root@nodeG5:~# lspci
00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) 01:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU

hjonnala commented 1 year ago

please install the edgetpu runtime

stanleyoz commented 1 year ago

I believe we have done that earlier, retry

ValueError: Failed to load delegate from libedgetpu.so.1

(py38_venv) root@nodeG5:/tf# echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update deb https://packages.cloud.google.com/apt coral-edgetpu-stable main % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)). 100 1210 100 1210 0 0 2737 0 --:--:-- --:--:-- --:--:-- 2743 OK Hit:1 http://security.debian.org/debian-security bullseye-security InRelease Hit:2 http://deb.debian.org/debian bullseye InRelease
Hit:3 https://packages.microsoft.com/debian/11/prod bullseye InRelease
Hit:4 https://packages.cloud.google.com/apt coral-cloud-stable InRelease
Hit:5 https://packages.cloud.google.com/apt coral-edgetpu-stable InRelease Hit:6 http://deb.debian.org/debian bullseye-updates InRelease Hit:7 http://deb.debian.org/debian bullseye-backports InRelease Hit:8 https://deb.nodesource.com/node_12.x bullseye InRelease Reading package lists... Done (py38_venv) root@nodeG5:/tf# sudo apt-get install libedgetpu1-std Reading package lists... Done Building dependency tree... Done Reading state information... Done libedgetpu1-std is already the newest version (16.0). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

TEST SCRIPT still reports that so is missing :(

(py38_venv) root@nodeG5:/tf# python predict_tpu.py Traceback (most recent call last): File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 160, in load_delegate delegate = Delegate(library, options) File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 119, in init raise ValueError(capture.message) ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "predict_tpu.py", line 14, in experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 162, in load_delegate raise ValueError('Failed to load delegate from {}\n{}'.format( ValueError: Failed to load delegate from libedgetpu.so.1

hjonnala commented 1 year ago

please add the below lines at import section in your test script and share the logs:

from pycoral.pybind._pywrap_coral import SetVerbosity as set_verbosity
set_verbosity(10)
stanleyoz commented 1 year ago

Traceback (most recent call last): File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 160, in load_delegate delegate = Delegate(library, options) File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 119, in init raise ValueError(capture.message) ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "predict_tpu.py", line 14, in experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 162, in load_delegate raise ValueError('Failed to load delegate from {}\n{}'.format( ValueError: Failed to load delegate from libedgetpu.so.1

(py38_venv) root@nodeG5:/tf# nano predict_tpu.py (py38_venv) root@nodeG5:/tf# nano predict_tpu.py (py38_venv) root@nodeG5:/tf# python predict_tpu.py I tflite/edgetpu_manager_direct.cc:453] No matching device is already opened for shared ownership. I driver/driver_factory_default.cc:31] Failed to open /sys/class/apex: No such file or directory I driver/usb/local_usb_device.cc:944] EnumerateDevices: vendor:0x1a6e, product:0x89a I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[4] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[3] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[1] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[2] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[0] I driver/usb/local_usb_device.cc:944] EnumerateDevices: vendor:0x18d1, product:0x9302 I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[4] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[3] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[1] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[2] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[0] I tflite/edgetpu_manager_direct.cc:471] No device of type Apex (PCIe) is available. I tflite/edgetpu_manager_direct.cc:471] No device of type Apex (USB) is available. I tflite/edgetpu_manager_direct.cc:471] No device of type Apex (Reference) is available. I tflite/edgetpu_manager_direct.cc:502] Failed allocating Edge TPU device for shared ownership. Traceback (most recent call last): File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 160, in load_delegate delegate = Delegate(library, options) File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 119, in init raise ValueError(capture.message) ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "predict_tpu.py", line 17, in experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 162, in load_delegate raise ValueError('Failed to load delegate from {}\n{}'.format( ValueError: Failed to load delegate from libedgetpu.so.1

hjonnala commented 1 year ago

please share your host machine details? Have you installed the gasket-dkms package?

Please share the gasket-dkms installation logs and output of below commands:

sudo dmesg |grep apex
sudo lspci -vvv | grep MSI-X
sudo lspci -vvv  
stanleyoz commented 1 year ago

Did run all the installations,

root@nodeG5:~# sudo apt-get install gasket-dkms libedgetpu1-std Reading package lists... Done Building dependency tree... Done Reading state information... Done gasket-dkms is already the newest version (1.0-18). libedgetpu1-std is already the newest version (16.0). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Logs ...

root@nodeG5:~# sudo dmesg |grep apex root@nodeG5:~# sudo lspci -vvv | grep MSI-X Capabilities: [d0] MSI-X: Enable- Count=128 Masked- root@nodeG5:~# sudo lspci -vvv 00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 236 Region 0: Memory at 18000000 (32-bit, non-prefetchable) [size=1M] Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: fff00000-000fffff [disabled] Prefetchable memory behind bridge: 18100000-182fffff [size=2M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- Expansion ROM at 18300000 [virtual] [disabled] [size=64K] BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit- Address: fc00a000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <8us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (downgraded), Width x1 (ok) TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ RootCap: CRSVisible+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+ RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP+ LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 RootCmd: CERptEn- NFERptEn- FERptEn- RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 Capabilities: [148 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [158 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=10us L1SubCtl2: T_PwrOn=10us Kernel driver in use: pcieport

01:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff) Subsystem: Global Unichip Corp. Coral Edge TPU Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 0 Region 0: Memory at 18200000 (64-bit, prefetchable) [disabled] [size=16K] Region 2: Memory at 18100000 (64-bit, prefetchable) [disabled] [size=1M] Capabilities: [80] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] MSI-X: Enable- Count=128 Masked- Vector table: BAR=2 offset=00046800 PBA: BAR=2 offset=00046068 Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [f8] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?> Capabilities: [108 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [110 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=25600ns L1SubCtl2: T_PwrOn=10us Capabilities: [200 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000

stanleyoz commented 1 year ago

Its a IMX8 single board computer running Debian 11, vnev is Python 3.8.12 and base is 3.9.x

hjonnala commented 1 year ago

root@nodeG5:~# sudo dmesg |grep apex~

Issue1: apex driver is not loading.

EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] MSI-X: Enable- Count=128 Masked-

Issue 2: MSI-X is not ableed for your hardware. https://www.kernel.org/doc/html/latest/PCI/msi-howto.html#:~:text=Using%20'lspci%20%2Dv'%20(,%E2%80%9C%2D%E2%80%9D%20(disabled).

stanleyoz commented 1 year ago

Thanks mate. Before I go try hunt down the issue (1) and (2), as they look pretty hard to replicate in production boards ...

Is this script ok?

(py38_venv) root@nodeG5:/tf# cat predict_tpu.py

Helper libraries

import time import pycoral as pycoral from pycoral.utils import edgetpu

import tensorflow as tf

import tflite_runtime.interpreter as tflite from tensorflow import keras from pycoral.pybind._pywrap_coral import SetVerbosity as set_verbosity set_verbosity(10)

Run inference with TensorFlow Lite with EDGE TPU

Load TFLite model and allocate tensors.

interpreter = tflite.Interpreter(model_path='/tf/inspect.tflite', experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')]) ....

Output was ...

(py38_venv) root@nodeG5:/tf# python predict_tpu.py I tflite/edgetpu_manager_direct.cc:453] No matching device is already opened for shared ownership. I driver/driver_factory_default.cc:31] Failed to open /sys/class/apex: No such file or directory I driver/usb/local_usb_device.cc:944] EnumerateDevices: vendor:0x1a6e, product:0x89a I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[4] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[3] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[1] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[2] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[0] I driver/usb/local_usb_device.cc:944] EnumerateDevices: vendor:0x18d1, product:0x9302 I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[4] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[3] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[1] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[3] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[2] port[0] I driver/usb/local_usb_device.cc:979] EnumerateDevices: checking bus[1] port[0] I tflite/edgetpu_manager_direct.cc:471] No device of type Apex (PCIe) is available. I tflite/edgetpu_manager_direct.cc:471] No device of type Apex (USB) is available. I tflite/edgetpu_manager_direct.cc:471] No device of type Apex (Reference) is available. I tflite/edgetpu_manager_direct.cc:502] Failed allocating Edge TPU device for shared ownership. Traceback (most recent call last): File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 160, in load_delegate delegate = Delegate(library, options) File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 119, in init raise ValueError(capture.message) ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "predict_tpu.py", line 17, in experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
File "/root/py38/py38_venv/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 162, in load_delegate raise ValueError('Failed to load delegate from {}\n{}'.format( ValueError: Failed to load delegate from libedgetpu.so.1

stanleyoz commented 1 year ago

Also I missed these logs you required last night

(py38_venv) root@nodeG5:/tf# sudo dmesg |grep apex (py38_venv) root@nodeG5:/tf#
(py38_venv) root@nodeG5:/tf# sudo lspci -vvv | grep MSI-X Capabilities: [d0] MSI-X: Enable- Count=128 Masked- (py38_venv) root@nodeG5:/tf# (py38_venv) root@nodeG5:/tf# sudo lspci -vvv
00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 236 Region 0: Memory at 18000000 (32-bit, non-prefetchable) [size=1M] Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: fff00000-000fffff [disabled] Prefetchable memory behind bridge: 18100000-182fffff [size=2M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- Expansion ROM at 18300000 [virtual] [disabled] [size=64K] BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit- Address: fc00a000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <8us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (downgraded), Width x1 (ok) TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ RootCap: CRSVisible+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+ RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP+ LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 RootCmd: CERptEn- NFERptEn- FERptEn- RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 Capabilities: [148 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [158 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=10us L1SubCtl2: T_PwrOn=10us Kernel driver in use: pcieport

01:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff) Subsystem: Global Unichip Corp. Coral Edge TPU Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 0 Region 0: Memory at 18200000 (64-bit, prefetchable) [disabled] [size=16K] Region 2: Memory at 18100000 (64-bit, prefetchable) [disabled] [size=1M] Capabilities: [80] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] MSI-X: Enable- Count=128 Masked- Vector table: BAR=2 offset=00046800 PBA: BAR=2 offset=00046068 Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [f8] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?> Capabilities: [108 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [110 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=25600ns L1SubCtl2: T_PwrOn=10us Capabilities: [200 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000

stanleyoz commented 1 year ago

Also

Linux version 5.15.32+g613bd464a2ed (oe-user@oe-host) (aarch64-poky-linux-gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.38.20220313) #1 SMP PREEMPT Tue Jun 7 02:34:46 UTC 2022

Debian 11 Python 3.8.12 (inside venv)

Thanks

hjonnala commented 1 year ago

PLeae check the sample script to run the inference at: https://github.com/hjonnala/snippets/blob/main/coral_inference.py

Before I go try hunt down the issue (1) and (2), as they look pretty hard to replicate in production boards ...

Are you having issue only with some boards?

For apex driver loading issue, please check if secure boot is enabled. If its so, please try disabling it. If apex loaded properly you should see similar output as mentioned at step6: https://coral.ai/docs/m2/get-started/#2a-on-linux

stanleyoz commented 1 year ago

We are still stuck at MSI-X disabled and therefore apex_0 driver not being able to be installed .. waiting for some answer from either NXP or our board OEM in Israel

stanleyoz commented 1 year ago

Seems it's holiday week in Israel :( .. anyway, I discovered something, when an Intel WIFI/BT card was installed in the same M.2 PCIe connector, its MSI-X was enabled. I hope its not too tricky for our OEM firmware engineer to sort this out for us.

So, this MSI-X is likely the problem blocking the installation of the apex_0 driver?

00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 236 Memory at 18000000 (32-bit, non-prefetchable) [size=1M] Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0 I/O behind bridge: [disabled] Memory behind bridge: 18100000-181fffff [size=1M] Prefetchable memory behind bridge: [disabled] Expansion ROM at 18200000 [virtual] [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit- Capabilities: [70] Express Root Port (Slot-), MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [148] Secondary PCI Express Capabilities: [158] L1 PM Substates Kernel driver in use: pcieport

With Intel AX210 WIFI/BT card

01:00.0 Network controller: Intel Corporation Device 2725 (rev 1a) Subsystem: Intel Corporation Device 0024 Flags: bus master, fast devsel, latency 0, IRQ 235 Memory at 18100000 (64-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [40] Express Endpoint, MSI 00 Capabilities: [80] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [14c] Latency Tolerance Reporting Capabilities: [154] L1 PM Substates Kernel driver in use: iwlwifi Kernel modules: iwlwifi

root@nodeG5:~# lspci -v 00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) (prog-if 00 [Normal decode]) Flags: fast devsel, IRQ 236 Memory at 18000000 (32-bit, non-prefetchable) [disabled] [size=1M] Bus: primary=00, secondary=00, subordinate=00, sec-latency=0 I/O behind bridge: 00000000-00000fff [size=4K] Memory behind bridge: 00000000-000fffff [size=1M] Prefetchable memory behind bridge: 00000000-000fffff [size=1M] Expansion ROM at 18200000 [virtual] [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit- Capabilities: [70] Express Root Port (Slot-), MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [148] Secondary PCI Express Capabilities: [158] L1 PM Substates Kernel driver in use: pcieport

with EDGE TPU Card ...

01:00.0 Network controller: Intel Corporation Device 2725 (prog-if ff) Subsystem: Global Unichip Corp. Device 089a Flags: fast devsel, IRQ 235 Memory at 18100000 (64-bit, non-prefetchable) [virtual] [size=16K] Memory at (64-bit, prefetchable) [virtual] Capabilities: [80] Express Endpoint, MSI 00 Capabilities: [d0] MSI-X: Enable- Count=128 Masked- Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+ Capabilities: [f8] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?> Capabilities: [108] Latency Tolerance Reporting Capabilities: [110] L1 PM Substates Capabilities: [200] Advanced Error Reporting Kernel driver in use: iwlwifi Kernel modules: iwlwifi

hjonnala commented 1 year ago

So, this MSI-X is likely the problem blocking the installation of the apex_0 driver?

I think, apex is already installed but its not loading. Can you please share the output of below commands:

- groups $USER
- sudo modinfo apex
stanleyoz commented 1 year ago

root@nodeG5:~# groups $USER root : root apex root@nodeG5:~# modinfo apex modinfo: ERROR: Module apex not found.

hjonnala commented 1 year ago

Please uninstall gasket-dkms package and share the installtion logs..

stanleyoz commented 1 year ago

root@nodeG5:~# apt install gasket-dkms Reading package lists... Done Building dependency tree... Done Reading state information... Done The following package was automatically installed and is no longer required: python3-tflite-runtime Use 'apt autoremove' to remove it. The following NEW packages will be installed: gasket-dkms 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B/48.0 kB of archives. After this operation, 256 kB of additional disk space will be used. perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_AU.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory Selecting previously unselected package gasket-dkms. (Reading database ... 81362 files and directories currently installed.) Preparing to unpack .../gasket-dkms_1.0-18_all.deb ... Unpacking gasket-dkms (1.0-18) ... Setting up gasket-dkms (1.0-18) ... locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory Loading new gasket-1.0 DKMS files... It is likely that 5.15.32+g613bd464a2ed belongs to a chroot's host Building for 5.10.0-21-arm64 and 5.15.32+g613bd464a2ed Building initial module for 5.10.0-21-arm64 Done.

gasket.ko: Running module version sanity check. Error! Module version 1.1.4 for gasket.ko is not newer than what is already found in kernel 5.10.0-21-arm64 (1.2). You may override by specifying --force.

apex.ko: Running module version sanity check.

depmod...

DKMS: install completed. Module build for kernel 5.15.32+g613bd464a2ed was skipped since the kernel headers for this kernel does not seem to be installed.

hjonnala commented 1 year ago

Running module version sanity check. Error! Module version 1.1.4 for gasket.ko is not newer than what is already found in kernel 5.10.0-21-arm64 (1.2).

It's not installed properly. Please remove it and try sudo apt-get install gasket-dkms. The logs looks like this: https://github.com/google-coral/edgetpu/issues/723#issuecomment-1428861577

stanleyoz commented 1 year ago

Seems the same ..

root@nodeG5:~# apt-get install gasket-dkms Reading package lists... Done Building dependency tree... Done Reading state information... Done The following package was automatically installed and is no longer required: python3-tflite-runtime Use 'apt autoremove' to remove it. The following NEW packages will be installed: gasket-dkms 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B/48.0 kB of archives. After this operation, 256 kB of additional disk space will be used. perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_AU.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory Selecting previously unselected package gasket-dkms. (Reading database ... 81362 files and directories currently installed.) Preparing to unpack .../gasket-dkms_1.0-18_all.deb ... Unpacking gasket-dkms (1.0-18) ... Setting up gasket-dkms (1.0-18) ... locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory Loading new gasket-1.0 DKMS files... It is likely that 5.15.32+g613bd464a2ed belongs to a chroot's host Building for 5.10.0-21-arm64 and 5.15.32+g613bd464a2ed Building initial module for 5.10.0-21-arm64 Done.

gasket.ko: Running module version sanity check. Error! Module version 1.1.4 for gasket.ko is not newer than what is already found in kernel 5.10.0-21-arm64 (1.2). You may override by specifying --force.

apex.ko: Running module version sanity check.

depmod...

DKMS: install completed. Module build for kernel 5.15.32+g613bd464a2ed was skipped since the kernel headers for this kernel does not seem to be installed. root@nodeG5:~#

In any case, U have asked the firmware engineers at our OEM Compulab in Israel to have a look (esp. the MSI-X) and they are waiting for the M.2 Edge TPU to arrive late next week. On my side, my MSI-X is always Enable- (not enabled).

hjonnala commented 1 year ago

Module build for kernel 5.15.32+g613bd464a2ed was skipped since the kernel headers for this kernel does not seem to be installed.

please install the kernel headers and uninstall gasket-dkms and install it again..

stanleyoz commented 1 year ago

I think because the kernel is custom by Compulab, it's not in the repository.

root@nodeG5:~# sudo apt update Hit:1 http://security.debian.org/debian-security bullseye-security InRelease Get:2 https://packages.microsoft.com/debian/11/prod bullseye InRelease [3629 B]
Hit:3 http://deb.debian.org/debian bullseye InRelease
Hit:4 https://packages.cloud.google.com/apt coral-cloud-stable InRelease
Get:5 https://packages.microsoft.com/debian/11/prod bullseye/main all Packages [1028 B] Hit:6 https://packages.cloud.google.com/apt coral-edgetpu-stable InRelease
Get:7 https://packages.microsoft.com/debian/11/prod bullseye/main amd64 Packages [83.3 kB] Get:8 http://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:9 http://deb.debian.org/debian bullseye-backports InRelease [49.0 kB]
Get:10 https://packages.microsoft.com/debian/11/prod bullseye/main arm64 Packages [14.6 kB] Get:11 https://packages.microsoft.com/debian/11/prod bullseye/main armhf Packages [13.4 kB]
Hit:12 https://deb.nodesource.com/node_12.x bullseye InRelease Fetched 209 kB in 3s (81.5 kB/s) Reading package lists... Done Building dependency tree... Done Reading state information... Done All packages are up to date. root@nodeG5:~# sudo apt install linux-headers-$(uname -r) Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package linux-headers-5.15.32+g613bd464a2ed E: Couldn't find any package by glob 'linux-headers

hjonnala commented 1 year ago

OK, pleae try building the package from source and install it: https://github.com/google/gasket-driver

stanleyoz commented 1 year ago

Hi mate! Thanks to hard work of Team Benjamin at Compulabs to mod the kernel of our IMX8PLUS board firmware, we managed to install the required drivers! Thanks for your help to ID the MSI-X issue :)

root@iot-gate-imx8plus:/coral/pycoral# python3 examples/classify_image.py \ --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \ --labels test_data/inat_bird_labels.txt \ --input test_data/parrot.jpg ----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. 14.0ms 3.7ms 3.9ms 4.0ms 4.0ms -------RESULTS-------- Ara macao (Scarlet Macaw): 0.75781

Now we carry on to port our work files over to our nodeG5 gateway and push ahead to realise the customer's requirements

stanleyoz commented 1 year ago

Hi. OK, parrot example works but when we mod our tensorflow lite script as instructed, we get ..

Load the TensorFlow Lite model

interpreter = tflite.Interpreter(model_path="mnist.tflite", expertimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])

Traceback (most recent call last): File "/home/compulab/tf/gpt_tflite_MNIST_TPU.py", line 9, in expertimental_delegates=[tflite.load_delegate('libedgetpu.so.1')]) File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate raise ValueError('Failed to load delegate from {}\n{}'.format( ValueError: Failed to load delegate from libedgetpu.so.1

compulab@iot-gate-imx8plus:~/tf$ sudo find / -name "libedgetpu.so.1" [rw,errors=remount-ro] /usr/lib/aarch64-linux-gnu/libedgetpu.so.1 /coral/pycoral/libedgetpu_bin/throttled/aarch64/libedgetpu.so.1 /coral/pycoral/libedgetpu_bin/throttled/armv7a/libedgetpu.so.1 /coral/pycoral/libedgetpu_bin/throttled/k8/libedgetpu.so.1 /coral/pycoral/libedgetpu_bin/direct/aarch64/libedgetpu.so.1 /coral/pycoral/libedgetpu_bin/direct/armv7a/libedgetpu.so.1 /coral/pycoral/libedgetpu_bin/direct/k8/libedgetpu.so.1

stanleyoz commented 1 year ago

OK, when I ran the script in root, it worked but for my simple single digit PNG file inference against trained MNIST database, the TPU invoke() took longer than the tensorflow CPU (IMX8) process, e.g. 2.4msec (TPU) vs. 0.97msec (CPU). Keen to investigate? We wanted a simple benchmark test to show that using the TPU accelerator "option" of our gateway on a popular demo will convince some developers to try the TPU version. I attach a link to the files if you got time to check why TPU is slower to inference. https://drive.google.com/drive/folders/1nAgx5kbogx4Li-fQBCbo8x2rtnwQFg7P?usp=sharing Thanks again mate.

hjonnala commented 1 year ago

OK, when I ran the script in root, it worked but for my simple single digit PNG file inference against trained MNIST database, the TPU invoke() took longer than the tensorflow CPU (IMX8) process, e.g. 2.4msec (TPU) vs. 0.97msec (CPU). Keen to investigate? We wanted a simple benchmark test to show that using the TPU accelerator "option" of our gateway on a popular demo will convince some developers to try the TPU version. I attach a link to the files if you got time to check why TPU is slower to inference. https://drive.google.com/drive/folders/1nAgx5kbogx4Li-fQBCbo8x2rtnwQFg7P?usp=sharing Thanks again mate.

Please create a new issue for this if you need any further help, as it is not relevant for this thread. The issue is none of the operations are running on TPU. Please go through the below links and fix the tflite conversion issue. Thanks!!

  1. https://coral.ai/docs/edgetpu/models-intro/
  2. https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb#scrollTo=kRDabW_u1wnv
  3. https://github.com/google-coral/edgetpu/issues/655#issuecomment-1241258031

image

google-coral-bot[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No