google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
428 stars 125 forks source link

Edge TPU M.2 - NanoPC-T4 #140

Closed skyler1537 closed 3 years ago

skyler1537 commented 4 years ago

Hi - I'm trying to install gasket-dms on a NanoPC-T4, but i'm having a hard time with the module build.

uname -r 4.4.179

Is there a workaround for installing this?

sudo apt-get install gasket-dkms
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libssl1.1:armhf
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  gasket-dkms
0 upgraded, 1 newly installed, 0 to remove and 407 not upgraded.
Need to get 0 B/46.1 kB of archives.
After this operation, 247 kB of additional disk space will be used.
Selecting previously unselected package gasket-dkms.
(Reading database ... 158292 files and directories currently installed.)
Preparing to unpack .../gasket-dkms_1.0-11_all.deb ...
Unpacking gasket-dkms (1.0-11) ...
Setting up gasket-dkms (1.0-11) ...
Loading new gasket-1.0 DKMS files...
It is likely that 4.4.179 belongs to a chroot's host
Building for 4.15.0-107-generic, 4.4.167 and 4.4.179
Building initial module for 4.15.0-107-generic
Done.

gasket:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-107-generic/updates/dkms/

apex.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-107-generic/updates/dkms/

depmod...

DKMS: install completed.
Module build for kernel 4.4.167 was skipped since the
kernel headers for this kernel does not seem to be installed.
Module build for kernel 4.4.179 was skipped since the
kernel headers for this kernel does not seem to be installed.
Namburger commented 4 years ago

kernel headers for this kernel does not seem to be installed.

Could you share outputs of this command?

$ apt search linux-headers-$(uname -r)
$ sudo apt install linux-headers-$(uname -r)

The apex/pcie modules will requires linux-headers, this is maintained by the vendor that release the kernel.

skyler1537 commented 4 years ago

Unfortunately I can't seem to find these.

pi@NanoPC-T4:~$ sudo apt search linux-headers-$(uname -r)
Sorting... Done
Full Text Search... Done
pi@NanoPC-T4:~$ sudo apt install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package linux-headers-4.4.179
E: Couldn't find any package by glob 'linux-headers-4.4.179'
E: Couldn't find any package by regex 'linux-headers-4.4.179'
Namburger commented 4 years ago

@skyler1253 Are there any other kernel that you can install with the NanoPC-T4, the linux headers package should be maintained by the vendors. Unfortunately this is not in our control :/

kampff commented 4 years ago

I am also having issues with the NanoPC-T4 + edgetpu (M.2, B+M).

After no success with FriendlyElec's/Rockchip's 4.4.179 kernel, I tried the Armbian kernels/distros (Armbian_20.05.4_Nanopct4_focal_current_5.4.46) and after reverting the kernel to 5.45 (because there were no headers available for 5.46) and installing the Linux headers via armbian-config, I could then compile the gasket-dkms module (via dkms and apt package install).

However, the rockchip pcie link training times out on boot. No "/dev/apex_0" or output from "lspci".

[ 0.674574] rockchip-pcie f8000000.pcie: no vpcie3v3 regulator found [ 0.675170] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found [ 0.675748] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found [ 0.676330] rockchip-pcie f8000000.pcie: missing "memory-region" property [ 0.676949] PCI host bridge /pcie@f8000000 ranges: [ 0.677393] MEM 0xfa000000..0xfbdfffff -> 0xfa000000 [ 0.677862] IO 0xfbe00000..0xfbefffff -> 0xfbe00000 [ 1.199001] rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout! [ 1.199635] rockchip-pcie f8000000.pcie: deferred probe failed [ 1.200406] rockchip-pcie: probe of f8000000.pcie failed with error -110

If anyone has made any progress, then I would love to learn from you. I am very enthusiastic to help debug this.

Namburger commented 4 years ago

@kampff With kernel 5.45, could you check if the default gasket/apex driver are loaded instead of our?

modinfo gasket
modinfo apex
kampff commented 4 years ago

@Namburger I have now rebuilt Armbian and included the kernel headers for the latest kernel (5.4.48).

> uname -r
5.4.48-rockchip64

> modinfo gasket
filename:       /lib/modules/5.4.48-rockchip64/updates/dkms/gasket.ko
author:         Rob Springer <rspringer@google.com>
license:        GPL v2
version:        1.1.3
description:    Google Gasket driver framework
srcversion:     069B6D0F6AE12073F4EAF5D
depends:        
name:           gasket
vermagic:       5.4.48-rockchip64 SMP preempt mod_unload aarch64
parm:           dma_bit_mask:int

> modinfo apex
filename:       /lib/modules/5.4.48-rockchip64/updates/dkms/apex.ko
author:         John Joseph <jnjoseph@google.com>
license:        GPL v2
version:        1.1
description:    Google Apex driver
srcversion:     508A8A34D57322CEA287D17
alias:          pci:v00001AC1d0000089Asv*sd*bc*sc*i*
depends:        gasket
name:           apex
vermagic:       5.4.48-rockchip64 SMP preempt mod_unload aarch64
parm:           allow_power_save:int
parm:           allow_sw_clock_gating:int
parm:           allow_hw_clock_gating:int
parm:           bypass_top_level:int
parm:           trip_point0_temp:int
parm:           trip_point1_temp:int
parm:           trip_point2_temp:int
parm:           hw_temp_warn1:int
parm:           hw_temp_warn2:int
parm:           hw_temp_warn1_en:bool
parm:           hw_temp_warn2_en:bool
parm:           temp_poll_interval:int

Nothing from "lspci" or /dev/apex_0

I have ordered both a new NanoPC-T4 and two more EdgeTPU modules to rule out any hardware problems. However, given the "PCIe link training timeout", I suspect there is something I need to do with the PCIe driver and/or device tree in either kernel/u-boot/both.

[ 3.161019] rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout! [ 3.161101] rockchip-pcie: probe of f8000000.pcie failed with error -110

kampff commented 4 years ago

@Namburger I have attached the full boot logs (u-boot debug followed by "dmesg"). Any suggestions would be very, very much appreciated. Thanks!

log_output_Armbian_5.4.48_wGasketApex.txt

kampff commented 4 years ago

I managed to get this to work (NanoPC-T4 + EdgeTPU PCIe).

I found a Tweet (https://twitter.com/dl4senses/status/1217541198417498112/photo/1) demonstrating that someone had managed to get the EdgeTPU working with the NanoPC-T4 on an earlier version of Armbian with the 5.4.0-rc1 kernel. The only images currently available on the Armbian website were for kernels 5.4.6 and later (https://dl.armbian.com/nanopct4/archive/). These images did recognize the EdgeTPU in the NanoPC-T4 upon boot (i.e. link training succeeds)! However, these older images do not have the "linux-headers" package available, which is required by the "gasket-dkms" installer.

I had previously tried building the "current" branch of Armbian (kernel 5.4.49), but this version did not link the PCIe (see above comments).

On a whim, I built the current Armbian "dev" branch with the following command line arguments:

./compile.sh INSTALL_HEADERS="yes" BOARD="nanopct4" BRANCH="dev" RELEASE="bionic"

This image did link train the PCIe with EdgeTPU installed. However, the "gasket-dkms" installer failed as the new kernel (5.7) no longer has the "ioremap_nocache()" function, which is required by "gasket_core.c". However, the mainline gasket framework and apex drivers have been updated. Therefore, I rebuilt the Armbian "dev" branch as above, but with a custom kernel config in /userpatches/linux-rockchip64-dev.config which simply added the following config options to the default config file:

CONFIG_STAGING_GASKET_FRAMEWORK=m
CONFIG_STAGING_APEX_DRIVER=m

This built successfully and booted with the /dev/apex_0 device available!

After installing libedgetpu1-std and some python components. The demo ran.

> python3 classify_image.py --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels models/inat_bird_labels.txt --input images/parrot.jpg
W :122] Could not set performance expectation : 4 (Inappropriate ioctl for device)
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
16.9ms
6.9ms
6.9ms
6.9ms
6.9ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.77734

A final question:

blazczak commented 3 years ago

Hey @kampff can you share the hash of the checked-out commit you used to build your working version of Armbian kernel? I have tried a few of the more recent 5.10 versions, prebuilt or custom compiled with no luck; still getting error -110 (one would think a fix to an earlier issue would have persisted in the tree). I'd like to build the same version as yours to sanity check the hardware.

kampff commented 3 years ago

I didn't save the specific commit hash that I used for the Armbian/build Repo, but I am nearly certain it was the master branch on July 1st, 2020 (commit: 3f998600096937454b18b350bea28f73d343fcef).

I then used the following command to build the "dev" branch of the Kernel:

./compile.sh INSTALL_HEADERS="yes" BOARD="nanopct4" BRANCH="dev" RELEASE="bionic"

On July 1st, 2020, I believe this built kernel version 5.7.6 (as seen in the linux-rockchip64-dev-config), but I don't know the specific Kernel commit hash that was used.

(also, I had to enable the Apex/Gasket modules in the kernel config as described before)

blazczak commented 3 years ago

Thanks @kampff I was able to check out that commit but their build code has galloped forward since that point in time so after being forced to downgrade to 20.04 on the build host I kept running into other issues with outdated downloads and dependencies etc. Gave up trying to build the old version as my time would be better spent trying to get TPU to work with one of the newer versions. Their 5.10.21 works well on Rock Pi 4C, unclear as to why 5.x on this SBC is more work.

blazczak commented 3 years ago

Downloaded the official Armbian_20.08_Nanopct4_bionic_current_5.7.15 image and the -110 PCIe error was gone in that image. Armbian_20.11.10_Nanopct4_bionic_current_5.9.14 is currently the most recent, equally error free version of bionic for this SBC. Slight issue with the initial configure script using /usr/bin/bash instead of /bin/bash when setting up the initial user (this will prevent log on with default user until you fix it in /etc/passwd) but other than that it works well (Armbian_20.11.7_Nanopct4_bionic_current_5.9.14 was unusable because of that same glitch preventing the configure script from running to completion).

Looks like something has definitely changed between bionic and focal in Armbian to make the focal builds exhibit the PCIe issue but I won't be spending any of my time on trying to figure out what the root cause is in focal. If someone comes across a 5.10 or newer version of focal which works on this SBC with TPU let us know.

If someone here decides to use Armbian_20.11.10_Nanopct4_bionic_current_5.9.14 with M.2 TPU, you may need to provide kernel-headers on your own for the compilation step. Here's how:

cd ~
wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.9.14.tar.xz
cd /usr/src
sudo tar xJf ~/linux-5.9.14.tar.xz
sudo mv /usr/src/linux-5.9.14 /usr/src/linux-headers-5.9.14-rockchip64
sudo apt install -y bc bison libncurses-dev flex libssl-dev
cd /usr/src/linux-headers-5.9.14-rockchip64
sudo make menuconfig
"Save && Exit"
sudo make
sudo make modules
echo '5.9.14-rockchip64' | sudo tee /usr/src/linux-headers-5.9.14-rockchip64/include/config/kernel.release
echo '#define UTS_RELEASE "5.9.14-rockchip64"' | sudo tee /usr/src/linux-headers-5.9.14-rockchip64/include/generated/utsrelease.h
sudo mkdir /lib/modules/5.9.14-rockchip64
sudo ln -s /usr/src/linux-headers-5.9.14-rockchip64 /lib/modules/5.9.14-rockchip64/build
sudo apt install -y gasket-dkms libedgetpu1-std

$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg ----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. 12.5ms 2.4ms 2.4ms 2.4ms 2.3ms -------RESULTS-------- Ara macao (Scarlet Macaw): 0.75781

blazczak commented 3 years ago

@skyler1253 @Namburger you can use the approach I suggested above to get the matching kernel headers for pretty much any Linux version, sufficient to successfully compile the TPU drivers. The kernel.org sources are genuine but generic so they won't have any vendor patches or customizations but that should largely not affect the compilation step needed here. If one is paranoid about security for production or mission critical deployments they should already have active support from their vendor with aptitude access to matching, current kernel-headers package. The end user may also want to try to locate and copy over the .config file for their current version before the header and module compilation step. The above approach worked well with a few different Linux versions and vendors.

bdherouville commented 3 years ago

Hi !

On my Rockpro64 with Armbian Here is how I managed to get my Edge TPU M.2 works :

apt update && apt upgrade -y && apt-get install linux-headers-current-rockchip64 sudo apt install -y --reinstall gasket-dkms libedgetpu1-std

reboot

@rockpro64:~/coral/pycoral$ python3 examples/classify_image.py \

--model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \ --labels test_data/inat_bird_labels.txt \ --input test_data/parrot.jpg

----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. 23.0ms 3.9ms 3.9ms 3.9ms 3.9ms -------RESULTS-------- Ara macao (Scarlet Macaw): 0.75781