google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
422 stars 124 forks source link

Building gasket-dkms broken on linux kernel 6.5.0 (Ubuntu 23.10) #808

Closed dewet22 closed 10 months ago

dewet22 commented 10 months ago

Description

Just upgraded my machines with PCIe TPUs from Ubuntu 23.04 to 23.10, upgrading from kernel 6.2 to 6.5.0.

apt/dpkg output:

...
Setting up python3-distupgrade (1:23.10.10) ...
Setting up gasket-dkms (1.0-18) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78.)
debconf: falling back to frontend: Readline
Removing old gasket-1.0 DKMS files...
Deleting module gasket-1.0 completely from the DKMS tree.
Loading new gasket-1.0 DKMS files...
Deprecated feature: REMAKE_INITRD (/usr/src/gasket-1.0/dkms.conf)
Building for 6.5.0-10-generic
Building initial module for 6.5.0-10-generic
Deprecated feature: REMAKE_INITRD (/var/lib/dkms/gasket/1.0/source/dkms.conf)
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/gasket-dkms.0.crash'
Error! Bad return status for module build on kernel: 6.5.0-10-generic (x86_64)
Consult /var/lib/dkms/gasket/1.0/build/make.log for more information.
dpkg: error processing package gasket-dkms (--configure):
 installed gasket-dkms package post-installation script subprocess returned error exit status 10
Setting up ubuntu-release-upgrader-core (1:23.10.10) ...
Processing triggers for dbus (1.14.10-1ubuntu1) ...
Errors were encountered while processing:
 gasket-dkms
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)

The mentioned crash log /var/crash/gasket-dkms.0.crash:

ProblemType: Package
DKMSBuildLog:
 DKMS make.log for gasket-1.0 for kernel 6.5.0-10-generic (x86_64)
 Thu Nov  9 21:35:37 UTC 2023
 make: Entering directory '/usr/src/linux-headers-6.5.0-10-generic'
 warning: the compiler differs from the one used to build the kernel
   The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
   You are using:           gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
   CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_core.o
   CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_ioctl.o
   CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_interrupt.o
   CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_page_table.o
 /var/lib/dkms/gasket/1.0/build/gasket_core.c: In function ‘gasket_register_device’:
 /var/lib/dkms/gasket/1.0/build/gasket_core.c:1841:41: error: passing argument 1 of ‘class_create’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  1841 |                 class_create(driver_desc->module, driver_desc->name);
       |                              ~~~~~~~~~~~^~~~~~~~
       |                                         |
       |                                         struct module *
 In file included from ./include/linux/device.h:31,
                  from ./include/linux/cdev.h:8,
                  from /var/lib/dkms/gasket/1.0/build/gasket_core.h:11,
                  from /var/lib/dkms/gasket/1.0/build/gasket_core.c:12:
 ./include/linux/device/class.h:230:54: note: expected ‘const char *’ but argument is of type ‘struct module *’
   230 | struct class * __must_check class_create(const char *name);
       |                                          ~~~~~~~~~~~~^~~~
 /var/lib/dkms/gasket/1.0/build/gasket_core.c:1841:17: error: too many arguments to function ‘class_create’
  1841 |                 class_create(driver_desc->module, driver_desc->name);
       |                 ^~~~~~~~~~~~
 ./include/linux/device/class.h:230:29: note: declared here
   230 | struct class * __must_check class_create(const char *name);
       |                             ^~~~~~~~~~~~
 cc1: some warnings being treated as errors
 make[2]: *** [scripts/Makefile.build:251: /var/lib/dkms/gasket/1.0/build/gasket_core.o] Error 1
 make[2]: *** Waiting for unfinished jobs....
 make[1]: *** [/usr/src/linux-headers-6.5.0-10-generic/Makefile:2037: /var/lib/dkms/gasket/1.0/build] Error 2
 make: *** [Makefile:234: __sub-make] Error 2
 make: Leaving directory '/usr/src/linux-headers-6.5.0-10-generic'
DKMSKernelVersion: 6.5.0-10-generic
Date: Thu Nov  9 21:35:39 2023
DuplicateSignature: dkms:gasket-dkms:1.0-18:/var/lib/dkms/gasket/1.0/build/gasket_core.c:1841:41: error: passing argument 1 of ‘class_create’ from incompatible pointer type [-Werror=incompatible-pointer-types]
Package: gasket-dkms 1.0-18
PackageVersion: 1.0-18
SourcePackage: gasket-dkms
Title: gasket-dkms 1.0-18: gasket kernel module failed to build
Click to expand! ### Issue Type Build/Install ### Operating System Ubuntu ### Coral Device _No response_ ### Other Devices _No response_ ### Programming Language _No response_ ### Relevant Log Output _No response_
dewet22 commented 10 months ago

Previously reported as #695 and #785.

dewet22 commented 10 months ago

This was fixed a few months ago in https://github.com/google/gasket-driver/ already; for my use case I just built the .deb from that repo directly and deployed across my Coral-bearing machines.

google-coral-bot[bot] commented 10 months ago

Are you satisfied with the resolution of your issue? Yes No

arevindh commented 10 months ago

This was fixed a few months ago in https://github.com/google/gasket-driver/ already; for my use case I just built the .deb from that repo directly and deployed across my Coral-bearing machines.

I am trying to build it for Promox, can you guide me on how to build it ?

dewet22 commented 10 months ago

I am trying to build it for Promox, can you guide me on how to build it ?

I don't use or know Proxmox so I can't help directly. If you need help on building a deb from sources, there are plenty of guides on the internet for that and the starting clue is in the linked repo already: debuild is the tool you'll need to use.

arevindh commented 10 months ago

I am trying to build it for Promox, can you guide me on how to build it ?

I don't use or know Proxmox so I can't help directly. If you need help on building a deb from sources, there are plenty of guides on the internet for that and the starting clue is in the linked repo already: debuild is the tool you'll need to use.

Oh ok , when i try debuild only thing i get is the below

dh: error: unable to load addon dkms: Can't locate Debian/Debhelper/Sequence/dkms.pm in @INC (you may need to install the Debian::Debhelper::Sequence::dkms module)

Debian::Debhelper::Sequence::dkms module) also impossible to install

Did you get any similar errors?

dewet22 commented 10 months ago

Make sure you have the dkms debhelper package installed: sudo apt install dh-dkms

arevindh commented 10 months ago

dh-dkms

Oops I didnt install dh-dkms , Thank you, let me try building

arevindh commented 10 months ago

@dewet22 frigate is working now 👍.

alienatedsec commented 10 months ago

Make sure you have the dkms debhelper package installed: sudo apt install dh-dkms

Some Proxmox instances require more effort ;)

sudo apt install dh-dkms devscripts git

arevindh commented 10 months ago

Make sure you have the dkms debhelper package installed: sudo apt install dh-dkms

Some Proxmox instances require more effort ;)

sudo apt install dh-dkms devscripts git

Yep , I did post in Double take discord server

rkbest13 commented 9 months ago

I have similar issue and the steps above dont work. If I try the debuild command, it returns debuild: fatal error at line 679: cannot find readable debian/changlog anywhere!

dh-dkms is installed.

Indianb0y016 commented 9 months ago

I have similar issue and the steps above dont work. If I try the debuild command, it returns debuild: fatal error at line 679: cannot find readable debian/changlog anywhere!

dh-dkms is installed.

Did you make sure you were in the respective directory? Building the gasket-driver should not fail if the necessary prereqs are installed, and you are in the right directory. No changelog error indicates you are not in the right directory.

bbccdd commented 9 months ago

FYI Workaround for Proxmox (debian based) systems here: https://forum.proxmox.com/threads/update-error-with-coral-tpu-drivers.136888/#post-608975

kub3let commented 9 months ago

When will this be available via https://packages.cloud.google.com/apt ?

It's been fixed but still not available in stable for over 3 months.

I really not want to install all the dev dependencies on a production node.

dgates62 commented 9 months ago

When will this be available via https://packages.cloud.google.com/apt ?

It's been fixed but still not available in stable for over 3 months.

I really not want to install all the dev dependencies on a production node.

Perhaps you could build the package in a Debian LXC or VM, transfer the compiled Debian package to the host Proxmox machine, and then install to avoid installing dev dependencies on a production node? Hope that helps!

drikster80 commented 8 months ago

This bug is now appearing in the HWE kernel in Ubuntu 22.04 as well. The "stable" repo isn't seeming very table anymore.

What do we need to do to get it packaged and updated in the Debian stable repo?

If you give me write access to the Google Cloud repos, I'll gladly update it myself.

Mikescotland commented 8 months ago

Stopped working here too. How to build the correct package for kernel 6.5?

dewet22 commented 8 months ago

Stopped working here too. How to build the correct package for kernel 6.5?

@Mikescotland As documented in the earlier notes:

❯ sudo apt install devscripts debhelper dh-dkms -y
...
❯ git clone https://github.com/google/gasket-driver.git
Cloning into 'gasket-driver'...
...
❯ cd gasket-driver; debuild -us -uc -tc -b; cd ..
...
dpkg-deb: building package 'gasket-dkms' in '../gasket-dkms_1.0-18_all.deb'.
...
❯ ls -l gasket-dkms*
-rw-r--r-- 1 dewet dewet 49000 Jan 17 13:17 gasket-dkms_1.0-18_all.deb
-rw-r--r-- 1 dewet dewet  1788 Jan 17 13:18 gasket-dkms_1.0-18_amd64.build
-rw-r--r-- 1 dewet dewet  5642 Jan 17 13:17 gasket-dkms_1.0-18_amd64.buildinfo
-rw-r--r-- 1 dewet dewet  1017 Jan 17 13:17 gasket-dkms_1.0-18_amd64.changes

You can install that .deb on any systems that need to build the kernel module, and it will be rebuilt automatically with newer kernel packages being installed:

❯ sudo dpkg -i gasket-dkms_1.0-18_all.deb
...
Setting up gasket-dkms (1.0-18) ...
...
Building for 6.5.0-14-generic
Building initial module for 6.5.0-14-generic
...
depmod...
Time: 0h:00m:10s
❯ sudo modprobe apex
❯ lsmod | grep apex
apex                   28672  0
gasket                135168  1 apex
Mikescotland commented 8 months ago

Got it working frigate.detectors.plugins.edgetpu_tfl INFO : TPU found

Thanks for your help!

themana commented 8 months ago

@dewet22 I'm trying to follow your directions but I get an error when trying to install dh-dkms, any help would be appreciated.

E: Unable to locate package dh-dkms

This is on the Ubuntu VM with the version info below Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

alienatedsec commented 8 months ago

@themana

sudo apt update
sudo apt upgrade
sudo apt install devscripts debhelper -y

Then clone and the rest

❯ git clone https://github.com/google/gasket-driver.git
Cloning into 'gasket-driver'...
...
❯ cd gasket-driver; debuild -us -uc -tc -b; cd ..
...
dpkg-deb: building package 'gasket-dkms' in '../gasket-dkms_1.0-18_all.deb'.
...
❯ ls -l gasket-dkms*
-rw-r--r-- 1 dewet dewet 49000 Jan 17 13:17 gasket-dkms_1.0-18_all.deb
-rw-r--r-- 1 dewet dewet  1788 Jan 17 13:18 gasket-dkms_1.0-18_amd64.build
-rw-r--r-- 1 dewet dewet  5642 Jan 17 13:17 gasket-dkms_1.0-18_amd64.buildinfo
-rw-r--r-- 1 dewet dewet  1017 Jan 17 13:17 gasket-dkms_1.0-18_amd64.changes

You can install that .deb on any systems that need to build the kernel module, and it will be rebuilt automatically with newer kernel packages being installed:

❯ sudo dpkg -i gasket-dkms_1.0-18_all.deb
...
Setting up gasket-dkms (1.0-18) ...
...
Building for 6.5.0-14-generic
Building initial module for 6.5.0-14-generic
...
depmod...
Time: 0h:00m:10s
❯ sudo modprobe apex
❯ lsmod | grep apex
apex                   28672  0
gasket                135168  1 apex
themana commented 8 months ago

@alienatedsec That worked perfectly, thank you so much

keptin commented 8 months ago

Thank you a ton @dewet22 and @alienatedsec, that did the trick! Very lucky I found this thread after a good deal of hunting. Cheers!

kymocode commented 8 months ago

Thanks @alienatedsec and @dewet22 this worked great. Not sure why they havent updated to stable though. Sure would make things much easier

matthewharrington commented 8 months ago

Just commenting here to say thanks team. I was tearing my hair out with my Coral failing until I found this thread.

bobmarley2021 commented 7 months ago

Can I just ask @alienatedsec @bbccdd @arevindh if your Proxmox installs are still working with your Coral?

I have:

  1. Built gasket-dkms from source as per above
  2. Tried reverting to a previous kernel (eg. 6.2.16-20-pve)

Modules seem loaded:

apex                   28672  0
gasket                135168  1 apex

No dice...

~ # ls /dev/apex*
zsh: no matches found: /dev/apex*

I've gone so far as to order another Coral today as I cannot even get this thing recognised and flashed by webcoral now. Starting to suspect a faulty unit.

balashov-ia commented 5 months ago

maybe you should check aspm feature in bios. this thing controls power management of pcie. it helped for me

alienatedsec commented 5 months ago

Can I just ask @alienatedsec @bbccdd @arevindh if your Proxmox installs are still working with your Coral?

Sorry for the late response @bobmarley2021 - I don't know why I missed this message - regardless, my proxmox cluster and all three nodes have the latest versions and work just fine. I documented my effort in this comment for frigate installation on proxmox with Coral support

Installing Coral drivers - go to the shell of the host and type the following commands

apt update
apt upgrade
apt install pve-headers-$(uname -r)
apt install proxmox-default-headers
apt install dh-dkms devscripts git
git clone  https://github.com/google/gasket-driver
cd gasket-driver
debuild -us -uc -tc -b -d
cd ..
dpkg -i gasket-dkms_1.0-18_all.deb
reboot

Once rebooted, ensure the apex_0 device is in /dev/.

ls -alh /dev/apex_0 
crw-rw---- 1 root root 120, 0 Mar 20 14:51 /dev/apex_0

Any future kernel updates should automatically include gasket-dkms after above steps.

Follow the rest here https://github.com/blakeblackshear/frigate/discussions/5448#discussioncomment-8855247

nglessner commented 5 months ago

Can I just ask @alienatedsec @bbccdd @arevindh if your Proxmox installs are still working with your Coral?

Sorry for the late response @bobmarley2021 - I don't know why I missed this message - regardless, my proxmox cluster and all three nodes have the latest versions and work just fine. I documented my effort in this comment for frigate installation on proxmox with Coral support

Installing Coral drivers - go to the shell of the host and type the following commands

apt update
apt upgrade
apt install pve-headers-$(uname -r)
apt install proxmox-default-headers
apt install dh-dkms devscripts git
git clone  https://github.com/google/gasket-driver
cd gasket-driver
debuild -us -uc -tc -b -d
cd ..
dpkg -i gasket-dkms_1.0-18_all.deb
reboot

Once rebooted, ensure the apex_0 device is in /dev/.

ls -alh /dev/apex_0 
crw-rw---- 1 root root 120, 0 Mar 20 14:51 /dev/apex_0

Any future kernel updates should automatically include gasket-dkms after above steps.

Follow the rest here blakeblackshear/frigate#5448 (comment)

I wish I could say following your directions worked.

I'm on proxmox 8.1.10 kernel 6.5.13-5 using the dual edge PCIe E-key. As far as I can tell building the gasket driver was successful, however in the end ls -alh /dev/apex_0 results in ls: cannot access '/dev/apex_0': No such file or directory.

running lspci -nn | grep 089a does result in 03:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a] so I believe the system can see the TPU, but for whatever reason isn't able to run the driver?

Anyway, I appreciate the help you've given for this setup.

alienatedsec commented 5 months ago

Is there any apex? It doesn't have to be apex_0 for dual edge and it's likely you will see only one. @nglessner

nglessner commented 5 months ago

@alienatedsec No. Sorry, I should have been more specific. ls -alh /dev/apex* also returns dev/apex_0': No such file or directory

I wouldn't think this would be the limiting factor, but I am attempting this on a Beelink EQ12 Pro (N305). The only available M.2 slot is the "WiFi" slot that is also CNVi. I had high hopes when I could see the device in lspci, but now I'm not so sure.

interestingly lspci -k shows the device as: 03:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU Subsystem: Global Unichip Corp. Coral Edge TPU Kernel driver in use: vfio-pci Kernel modules: apex

alienatedsec commented 5 months ago

@nglessner Did you already set the PCI pass through and added it to the blocklist?

Asking because if you added an apex device to a blocklist, it will not show on the host.

The only available M.2 slot is the "WiFi" slot that is also CNVi.

Most of those slots are CNVi, but have only one PCIe lane, which is the limiting factor to recognise only one Coral.

bobmarley2021 commented 5 months ago

@alienatedsec not a problem - thanks for replying. I have since solved the issue on my end. In my case there was part faulty product and part user error (although the user error wouldn’t have mattered at that point due to the former).

I am using a USB coral and didn’t realise that apex is only for PCIe coral. The fault I had with my USB device was causing it not to flash firmware on connection. When I first encountered this problem I googled and was barking up the wrong tree following gasket driver instructions. Then I read on some reviews that certain batches of the USB devices go faulty and appear to exhibit this behaviour - the only solution being to RMA. I replaced the unit and now all is well. I pinned an earlier kernel to be safe, reinstalled the USB driver from the official google repo and I have been up and running ever since :-) 🎉

nglessner commented 5 months ago

Thank you @alienatedsec ! My issue was that I had added it to the blocklist. I eventually got it working!

johntdavis84 commented 5 months ago

Most of those slots are CNVi, but have only one PCIe lane, which is the limiting factor to recognise only one Coral.

Is it normal for the CNVi slot to not even recognize the Coral?

I've got an HP Elite Mini, and I can't even see the card in lspci in Proxmox. I can't decide if it's broken, or if something is actually misconfigured in the BIOS.

More likely, it's just not supported in the BIOS.

LiloBzH commented 5 months ago

ok with : Linux proxmox 6.8.4-2-pve

after recomplication of Coral driver // gasket-dkms_1.0-18_all.deb

dmandn commented 3 months ago

Reinstalled fresh with debian 12, kernel 6.1.0-21-amd64 and is workign fine.

hapklaar commented 3 months ago

Any future kernel updates should automatically include gasket-dkms after above steps.

@alienatedsec Using your guide the corals work fine for me in LXC on Proxmox, but I have to repeat after every kernel update. Any ideas why this is not automatic?

edit: just discovered I had everything but the 'proxmox-default-headers' package. Is this required for it to happen automatically perhaps?

alienatedsec commented 3 months ago

Is this required for it to happen automatically perhaps?

@hapklaar I recon the missing package was the reason.