google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
422 stars 125 forks source link

ls: cannot access '/dev/apex*': No such file or directory #729

Closed haroldboom closed 1 year ago

haroldboom commented 1 year ago

Description

Hi all,

I am running ESXi 7 U3 with a Ubuntu 22 host with my dual Coral PCI adapter passed through, I followed https://coral.ai/docs/m2/get-started/#2a-on-linux but when I check if the PCI drivers are loaded I get the following;

root@frigate:~# ls /dev/apex ls: cannot access '/dev/apex': No such file or directory

root@frigate:~# lspci -nn | grep 089a 04:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] 1b:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]

root@frigate:~# lsmod | grep apex apex 28672 0 gasket 122880 1 apex

root@frigate:~# modprobe apex root@frigate:~#

root@frigate:~# modinfo apex filename: /lib/modules/5.15.0-67-generic/updates/dkms/apex.ko author: John Joseph jnjoseph@google.com license: GPL v2 version: 1.2 description: Google Apex driver srcversion: 700E8BBBE9CC23C6EC17712 alias: pci:v00001AC1d0000089Asvsdbcsci* depends: gasket retpoline: Y name: apex vermagic: 5.15.0-67-generic SMP mod_unload modversions parm: allow_power_save:int parm: allow_sw_clock_gating:int parm: allow_hw_clock_gating:int parm: bypass_top_level:int parm: trip_point0_temp:int parm: trip_point1_temp:int parm: trip_point2_temp:int parm: hw_temp_warn1:int parm: hw_temp_warn2:int parm: hw_temp_warn1_en:bool parm: hw_temp_warn2_en:bool parm: te

root@frigate:~# apt install linux-headers-$(uname -r) Reading package lists... Done Building dependency tree... Done Reading state information... Done linux-headers-5.15.0-67-generic is already the newest version (5.15.0-67.74). 0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.

root@frigate:~# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 45 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: 11th Gen Intel(R) Core(TM) i7-11700B @ 3.20GHz CPU family: 6 Model: 141 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 BogoMIPS: 6374.39 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities Virtualization features: Hypervisor vendor: VMware Virtualization type: full Caches (sum of all):
L1d: 384 KiB (8 instances) L1i: 256 KiB (8 instances) L2: 10 MiB (8 instances) L3: 24 MiB (1 instance) NUMA:
NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp

Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Srbds: Not affected Tsx async abort: Not affected

lspci output.txt

Hopefully someone can help, thanks! Let me know if you want me to run anythign else.

Click to expand! ### Issue Type Support ### Operating System Ubuntu ### Coral Device M.2 Accelerator with dual Edge TPU ### Other Devices _No response_ ### Programming Language _No response_ ### Relevant Log Output _No response_
haroldboom commented 1 year ago

dmesg output is as follows;

[ 16.354557] apex 0000:04:00.0: Page table init timed out [ 16.354562] apex 0000:04:00.0: MSI-X table init timed out [ 16.354837] apex: probe of 0000:04:00.0 failed with error -110 [ 16.356338] apex 0000:1b:00.0: Page table init timed out [ 16.356341] apex 0000:1b:00.0: MSI-X table init timed out [ 16.356587] apex: probe of 0000:1b:00.0 failed with error -110

hjonnala commented 1 year ago

Hi, seems to be there is some issue with ESXi implemenation of MSI-X. Please go through this issue to try any other alternatives: https://github.com/google-coral/edgetpu/issues/343#issuecomment-853529569.. Thanks!!

google-coral-bot[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No