google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
412 stars 124 forks source link

Coral M.2 causing higher CPU utilization? #774

Open rovingclimber opened 12 months ago

rovingclimber commented 12 months ago

Hi!

I bought an M.2 coral TPU a while ago to use with Frigate running on Proxmox. I kept seeing issues around CPU utilization being way higher once the card was installed, even without Frigate running. I've just run some tests with a fresh proxmox install, no VMs or containers, and I'm seeing 15-20% cpu utilization just from having the coral card installed, with nothing running, whether or not the driver is loaded!

In the screenshot below you can see the graph of CPU usage on the box ... stared with the coral device installed then spent time installing the driver, then left the system running idle. During all that the CPU usage doesn't drop below ~15%. I then powered down, removed the pcie card, powered up again and left it idle. CPU usage below 1%. I then blacklisted apex & gasket, powered down, re-installed the card and powered up again you can see the CPU back up to ~15%.

This is on an Asrock J5040itx board, M.2 Accelerator with Dual Edge TPU (only 1 TPU gets detected unfortunately). I've also tried the card in a new Asrock N100DC-itx - unfortunately it's not even detected, never shown in lspci :(

Coral M2 issues

rovingclimber commented 12 months ago

OK, I just did some more testing and definitely seems to be a fundamental hardware issue with the M.2 card and the J5040 motherboard. I get the same phenomenon in a fresh Debian build or booting any LiveCD. The Coral card just makes the whole system crawl, every task takes way longer than it should and makes CPU usage show as much higher.

I've ordered a PCIe > M.2 E adapter to try with that instead.

rovingclimber commented 12 months ago

Issue confirmed after more testing on fresh debian install. Even without the coral driver installed, just physically installing the TPU in the M.2 E-key slot on an Asrock J5040-ITX causes a huge drop in multi-threaded performance. Single-thread benchmark shows a small (~10% drop), I/O benchmark is similar, but multi-thread (7z b -mmt) results for the exact same, otherwise idle system are: Without coral plugged in: ~9200 MIPS With coral plugged in: ~1000 MIPS

arigit commented 10 months ago

@rovingclimber did you try the M.2 e-key card on the PCIe adapter? did you notice any improvement with it? what about performance with the adapter in the n100dc-itx:

I was looking into a similar setup (with the PCIe adapter) but on the newer n100dc-itx; from the specs, the n100dc M.2 e-key slot is not PCIE (whereas the 5040 m.2 e-key was dual-function, pcie + CNVio), it's CNVio-only so can't be used for the coral.

rovingclimber commented 10 months ago

@arigit I can update you on both those points! With a cheap M.2 E to PCIe adapter (the sort you find on Amazon) the TPU works well with no CPU hit, albeit you only get one TPU if you bought the dual TPU coral card.

I can confirm that on the N100DC-ITX the card isn't detected in the on-board slot (I seem to have one of every ASrock board now!

I eventually went for the dual edge TPU adapter from Makerfabs, which works perfectly and gives me both TPUs on my dual coral:

https://www.makerfabs.com/dual-edge-tpu-adapter.html

arigit commented 10 months ago

Thanks! encouraging. I already have the m2 > pcie adapter on the way, together with the n100dc-itx. my coral device is single-tpu so I bought one that had some successful coral user reports. I also went through many of the asrock mini-itx fanless / cpu-onbard mobos, starting with the J1900 :). I have been using the apollo lake one (4205) for the last 6 years so I expect the N100 to be a huge upgrade, based on the reviews I've seen. I'll be moving Frigate from RPi+coral usb to N100+coral pcie

rovingclimber commented 10 months ago

I can confirm the N100 is a big upgrade, it's very quick. Unfortunately this comes at a cost in terms of idle power consumption and heat. I couldn't get the whole system power consumption down to a figure I was happy with for a 24/7 box, I've gone back to the J5040... and actually I might go back to the original J4125 which drops a few watts again for probably 5-10% higher CPU utilization for same load.

For info I'm running proxmox on the box itself, then Frigate in a docker container within an LXC container on proxmox. Although this nested containerization is "frowned upon" it's by far the easiest to get working well and seems to have very little overhead hit.

The other thing I'd note (you may already be familiar) is the Realtek NIC on all those boards loads the wrong module in proxmox / debian / ubuntu and you'll get intermittent errors if you don't correct that.

arigit commented 10 months ago

Thanks for the tip! what tweak did you need to do on the n100 to get stable ethernet? I've always used ubuntu LTS, wired ethernet, same plan for the n100 - I don't recall having had to tweak modules on the j4205 but it's been many years

What temp / consumption did you notice in your n100?

Years ago I measured around 20W on the 4205 when using it as HTPC with full load / 4k video (ubuntu lts), and the HTPC was in Suspend mode 50% of the time on average. I never got to run HA / frigate in it. On the N100 I plan to run the same HTPC environment in a limited-privilege environment (kernel/network cgroup) same as I've been doing in the old build, but as always-on, plus a bunch of docker containers (HA, frigate, zwave, grafana etc) that I currently run in the Rpi. I was hoping it will be in the 15W range. I want a fully fanless homeserver build.

rovingclimber commented 10 months ago

Didn't measure the temp but just running the board on table when testing the heatsink gets hot to the touch even at no load. In a case it def needs a fan IMO. I was disappointed as the TDP suggested it'd be similar or more efficient than the older chips, but actually it seems base power consumption is not great. It was 30W+ for the same baseline load (Frigate, HA and a few other low load containers) that would sit <20W on the J4125. Different setup in that the J4125 is running from a PicoPSU whereas the N100DC has on-board PSU with external 19V power supply, and I had NVMe SSD in the N100DC vs SATA SSD on the older board. I'd be interested if you can get a better result.

Regarding ethernet debian seems to default to r8169 module which is actually a slightly different family chip, it works but you might see errors over time. You can grab the r8168 driver direct from Realtek and build (with a tweak for later kernel), I did a write-up here:

https://rovingclimber.com/2023/07/28/proxmox-8-debian-12-with-realtek-rtl8111h-nic/

There's also a blog on frigate under proxmox on that site but I haven't finished it yet.

arigit commented 9 months ago

@rovingclimber just to share that I got my N100DC and I understand what you meant: out of the box, idle temp in core and motherboard chipset was 70+ degrees! I did a few tweaks and got a much more reasonable situation, happy to report:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +38.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +35.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +35.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +35.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +35.0°C  (high = +105.0°C, crit = +105.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +35.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +35.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +36.9°C  (low  = -273.1°C, high = +65261.8°C)

:~$ uptime
 14:53:15 up  2:17,  1 user,  load average: 0.00, 0.00, 0.00

This is on open air (table top testing).

Basically what I did is:

Most importantly: I did not change the CPU PL1/PL2 nor any of the power-limiting configuration entries from their default values. So no performance impact at all, based on my tests so far.

Also the system seems to be stable, no freezes or issues in my tests so far.

Idle temps went from 71 degrees to 38 degrees which is actually lower than what I see in my J4205.

chpego commented 8 months ago

I've also noticed a very high IO delay problem since installing Coral (pcie version) on my Proxmox 8.0.4