google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
429 stars 125 forks source link

Coral TPU pcie/M.2 issues on Debian 12 0 - AMD am4 build #852

Open dmandn opened 5 months ago

dmandn commented 5 months ago

Description

Hi, I'm on Debian 12, tried with a kernel 6.8 install, plus proxmox 8 with kernel 6.8.x and now I'm on Debian 12 again (another fresh install), on kernal 6.1.0-21-amd64. I have followed your steps to build and install this version of gasket-dkms and from the basic checks, things seem ok on the host, but I cannot get the TPUs to work in/with anything.

When I perform a "dmesg | grep -i apex", I get the below:

dmesg | grep -i apex [ 16.471161] apex 0000:08:00.0: enabling device (0000 -> 0002) [ 16.474447] apex 0000:08:00.0: Couldn't initialize interrupts: -22 [ 16.476806] apex 0000:09:00.0: enabling device (0000 -> 0002) [ 16.480201] apex 0000:09:00.0: Couldn't initialize interrupts: -22 [ 16.482306] apex 0000:0a:00.0: enabling device (0000 -> 0002) [ 16.488254] apex 0000:0a:00.0: Couldn't initialize interrupts: -22 [ 21.504640] apex 0000:09:00.0: Apex performance not throttled due to temperature [ 21.504639] apex 0000:0a:00.0: Apex performance not throttled due to temperature [ 21.504657] apex 0000:08:00.0: Apex performance not throttled due to temperature [ 164.884046] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 164.885596] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 164.889985] apex 0000:09:00.0: Couldn't reinit interrupts: -22 [ 196.858032] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 196.861839] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 196.866137] apex 0000:09:00.0: Couldn't reinit interrupts: -22 [ 228.844407] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 228.848057] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 228.852692] apex 0000:09:00.0: Couldn't reinit interrupts: -22 [ 260.899907] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 260.903764] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 260.908369] apex 0000:09:00.0: Couldn't reinit interrupts: -22

ls -l /dev/apex* crw-rw---- 1 root apex 120, 0 May 26 09:50 /dev/apex_0 crw-rw---- 1 root apex 120, 1 May 26 09:50 /dev/apex_1 crw-rw---- 1 root apex 120, 2 May 26 09:50 /dev/apex_2

lspci -nn | grep 089a 08:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] 09:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] 0a:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]

lsmod | grep apex apex 28672 0 gasket 126976 1 apex

apt list apex Listing... Done golang-github-apex-log-dev/stable 1.1.1-3 all

apt list gasket Listing... Error! E: input:0-26: error: Expected pattern gasket-dkms_1.0-18_all.deb ^^^^^^^^^^^^^^^^^^^^^^^^^^

frigate.detectors.plugins.edgetpu_tfl ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors

I have the same symptoms with this gasket-dkms build as I had with a previous one, except the permission denied errors are gone from the dmesg output now. Still unable to clear the "Couldn't initialise interrupts: -22" error, which I assume is part of the issue? Any help would be greatly appreciated as I have been fighting with these Coral TPUs for 1 week now.

...Also please advise if this is the wrong place to ask and I will move my post to wherever can be suggested.

Click to expand! ### Issue Type Support ### Operating System Linux ### Coral Device M.2 Accelerator with dual Edge TPU ### Other Devices _No response_ ### Programming Language _No response_ ### Relevant Log Output ```shell Hi, I'm on Debian 12, tried with a kernel 6.8 install, plus proxmox 8 with kernel 6.8.x and now I'm on Debian 12 again (another fresh install), on kernal 6.1.0-21-amd64. I have followed your steps to build and install this version of gasket-dkms and from the basic checks, things seem ok on the host, but I cannot get the TPUs to work in/with anything. When I perform a "dmesg | grep -i apex", I get the below: dmesg | grep -i apex [ 16.471161] apex 0000:08:00.0: enabling device (0000 -> 0002) [ 16.474447] apex 0000:08:00.0: Couldn't initialize interrupts: -22 [ 16.476806] apex 0000:09:00.0: enabling device (0000 -> 0002) [ 16.480201] apex 0000:09:00.0: Couldn't initialize interrupts: -22 [ 16.482306] apex 0000:0a:00.0: enabling device (0000 -> 0002) [ 16.488254] apex 0000:0a:00.0: Couldn't initialize interrupts: -22 [ 21.504640] apex 0000:09:00.0: Apex performance not throttled due to temperature [ 21.504639] apex 0000:0a:00.0: Apex performance not throttled due to temperature [ 21.504657] apex 0000:08:00.0: Apex performance not throttled due to temperature [ 164.884046] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 164.885596] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 164.889985] apex 0000:09:00.0: Couldn't reinit interrupts: -22 [ 196.858032] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 196.861839] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 196.866137] apex 0000:09:00.0: Couldn't reinit interrupts: -22 [ 228.844407] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 228.848057] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 228.852692] apex 0000:09:00.0: Couldn't reinit interrupts: -22 [ 260.899907] apex 0000:0a:00.0: Couldn't reinit interrupts: -22 [ 260.903764] apex 0000:08:00.0: Couldn't reinit interrupts: -22 [ 260.908369] apex 0000:09:00.0: Couldn't reinit interrupts: -22 ls -l /dev/apex* crw-rw---- 1 root apex 120, 0 May 26 09:50 /dev/apex_0 crw-rw---- 1 root apex 120, 1 May 26 09:50 /dev/apex_1 crw-rw---- 1 root apex 120, 2 May 26 09:50 /dev/apex_2 lspci -nn | grep 089a 08:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] 09:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] 0a:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] lsmod | grep apex apex 28672 0 gasket 126976 1 apex apt list apex Listing... Done golang-github-apex-log-dev/stable 1.1.1-3 all apt list gasket Listing... Error! E: input:0-26: error: Expected pattern gasket-dkms_1.0-18_all.deb ^^^^^^^^^^^^^^^^^^^^^^^^^^ frigate.detectors.plugins.edgetpu_tfl ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors ```
bdherouville commented 1 month ago

Hi,

I solved similar issue by doing this :

echo 1 > /sys/bus/pci/devices/0000\:08\:00.0/remove echo 1 > /sys/bus/pci/rescan