freifunkh / ansible

Here we store all Ansible roles and configs used for Freifunk Hannover.
MIT License
7 stars 3 forks source link

dkms fails to build batman module #157

Open AiyionPrime opened 3 years ago

AiyionPrime commented 3 years ago

Apparently on new kernel upgrades dkms fails to build batman properly. On regular kernel updates.

I cannot reproduce it, since I cannot issue new kernel packages. But it happens with every single new kernel package.

Originally posted by @1977er in https://github.com/freifunkh/ansible/issues/156#issuecomment-782605376

1977er commented 3 years ago
Feb 20 13:45:51 sn08 kernel: BUG: unable to handle kernel paging request at ffffffffc05cd060
Feb 20 13:45:51 sn08 kernel: IP: [<ffffffffbbd57413>] list_del+0x13/0x30
Feb 20 13:45:51 sn08 kernel: RIP  [<ffffffffbbd57413>] list_del+0x13/0x30
lemoer commented 3 years ago

I added the milestone "Beginn der stabilen Phase", as this is marked as bug and should therefore fixed before the "stabile Phase".

lemoer commented 3 years ago

As just discussed in Mumble, we will remove this from the milestone to fulfill the milestone in time. So everyone has to take care of this manually for a few more weeks...

lemoer commented 3 years ago

Hopefully fixed in https://github.com/freifunkh/ansible/commit/97b950d08e859f9fd9a8d1d100dc0cf412ea70b2 .

Was a little hustle to find, but someone in the dkms land thought, it is a good idea to assume that the variable MAKE will always starts with MAKE="make". Otherwise things break horribly and the module is always built for currently running kernel.

lemoer commented 3 years ago

You can check you are affected using: find /lib/modules/ -name batman-adv.ko | xargs md5sum | grep dkms

Bad output:

[root@sn10]:/usr/src/batman-adv-v2021.0 # find /lib/modules/ -name batman-adv.ko | xargs md5sum | grep dkms
c2b3ee5516486a278a950029925abebb  /lib/modules/4.19.0-16-amd64/updates/dkms/batman-adv.ko
c2b3ee5516486a278a950029925abebb  /lib/modules/4.19.0-14-amd64/updates/dkms/batman-adv.ko

(Modules for different kernels should'nt have same checksum)

Good output:

[root@sn09]:~ # find /lib/modules/ -name batman-adv.ko | xargs md5sum | grep dkms
c2b3ee5516486a278a950029925abebb  /lib/modules/4.19.0-14-amd64/updates/dkms/batman-adv.ko
faed6995a7a4cffac0455de641af9a8e  /lib/modules/4.19.0-16-amd64/updates/dkms/batman-adv.ko

(Different kernels have different checksums)

lemoer commented 3 years ago

It seems that 97b950d didn't fix the problem completely.

Before 97b950d the output of dkms install looks like this:

[root@sn05]:~ # dkms install --force -m batman-adv -v v2021.0 -k 4.19.0-14-amd64

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
'make' all..........
cleaning build area...

DKMS: build completed.

batman-adv.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.19.0-14-amd64/updates/dkms/

depmod...

DKMS: install completed.

After 97b950d the variable KERNEL_RELEASE added to the make command:

[root@sn10]:~ # dkms install --force -m batman-adv -v v2021.0 -k 4.19.0-16-amd64

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
make -j4 KERNELRELEASE=4.19.0-16-amd64 all........
cleaning build area...

DKMS: build completed.

batman-adv.ko:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/4.19.0-16-amd64/updates/dkms/

depmod...

DKMS: install completed.

But still the binary for the 4.19.0-16-amd64 kernel is wrong:

[root@sn10]:/usr/src/batman-adv-v2021.0 # find /lib/modules/ -name batman-adv.ko | xargs md5sum | grep dkms
c2b3ee5516486a278a950029925abebb  /lib/modules/4.19.0-16-amd64/updates/dkms/batman-adv.ko
c2b3ee5516486a278a950029925abebb  /lib/modules/4.19.0-14-amd64/updates/dkms/batman-adv.ko
lemoer commented 3 years ago

The logs look odd:

[root@sn10]:/usr/src/batman-adv-v2021.0 # cat /var/lib/dkms/batman-adv/v2021.0/4.19.0-16-amd64/x86_64/log/make.log
DKMS make.log for batman-adv-v2021.0 for kernel 4.19.0-16-amd64 (x86_64)
Mo 29. Mär 00:02:32 CEST 2021
/var/lib/dkms/batman-adv/v2021.0/build/gen-compat-autoconf.sh /var/lib/dkms/batman-adv/v2021.0/build/compat-autoconf.h
make -C /lib/modules/4.19.0-14-amd64/build M=/var/lib/dkms/batman-adv/v2021.0/build PWD=/var/lib/dkms/batman-adv/v2021.0/build REVISION=2021.0 CONFIG_BATMAN_ADV=m CONFIG_BATMAN_ADV_DEBUG=n CONFIG_BATMAN_ADV_BLA=y CONFIG_BATMAN_ADV_DAT=y CONFIG_BATMAN_ADV_NC=n CONFIG_BATMAN_ADV_MCAST=y CONFIG_BATMAN_ADV_TRACING=n CONFIG_BATMAN_ADV_BATMAN_V=y INSTALL_MOD_DIR=updates/modules
make[1]: Verzeichnis „/usr/src/linux-headers-4.19.0-14-amd64“ wird betreten
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bat_algo.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bat_iv_ogm.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bat_v.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bat_v_elp.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bat_v_ogm.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bitarray.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/bridge_loop_avoidance.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/distributed-arp-table.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/fragmentation.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/gateway_client.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/gateway_common.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/hard-interface.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/hash.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/main.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/multicast.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/netlink.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/originator.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/routing.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/send.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/soft-interface.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/tp_meter.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/translation-table.o
  CC [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/tvlv.o
  LD [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/batman-adv.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/batman-adv.mod.o
  LD [M]  /var/lib/dkms/batman-adv/v2021.0/build/net/batman-adv/batman-adv.ko
make[1]: Verzeichnis „/usr/src/linux-headers-4.19.0-14-amd64“ wird verlassen

The first line

DKMS make.log for batman-adv-v2021.0 for kernel 4.19.0-16-amd64 (x86_64)

contains the correct version 4.19.0-16-amd64.

However, the lines

/var/lib/dkms/batman-adv/v2021.0/build/gen-compat-autoconf.sh /var/lib/dkms/batman-adv/v2021.0/build/compat-autoconf.h
make -C /lib/modules/4.19.0-14-amd64/build M=/var/lib/dkms/batman-adv/v2021.0/build PWD=/var/lib/dkms/batman-adv/v2021.0/build REVISION=2021.0 CONFIG_BATMAN_ADV=m CONFIG_BATMAN_ADV_DEBUG=n CONFIG_BATMAN_ADV_BLA=y CONFIG_BATMAN_ADV_DAT=y CONFIG_BATMAN_ADV_NC=n CONFIG_BATMAN_ADV_MCAST=y CONFIG_BATMAN_ADV_TRACING=n CONFIG_BATMAN_ADV_BATMAN_V=y INSTALL_MOD_DIR=updates/modules
make[1]: Verzeichnis „/usr/src/linux-headers-4.19.0-14-amd64“ wird betreten

and

make[1]: Verzeichnis „/usr/src/linux-headers-4.19.0-14-amd64“ wird verlassen

still contain the incorrect version 4.19.0-14-amd64.

lemoer commented 3 years ago

At ffda I found this dkms.conf: https://git.darmstadt.ccc.de/ffda/infra/salt/-/blob/master/batman_adv/files/dkms.conf.j2

Not sure, what the KERNELPATH thing does, but maybe we could try them one day.

lemoer commented 3 years ago

For today I'll leave this bug. It's still there, but I need to stop working on this now.

We can continue to work on this again, when the next kernel will be delivered.

To hotfix the problem, you can reboot to the new kernel and call:

$ dkms uninstall --force -m batman-adv -v v2021.0
$ dkms install --force -m batman-adv -v v2021.0
1977er commented 3 years ago

We have one shot left with sn07.

1977er commented 6 months ago

@lemoer Closing it with "won't fix"? Sad but true.