MEN-Mikro-Elektronik / 13MD05-90

MDIS5 System Package for Linux (including drivers)
Other
4 stars 4 forks source link

Kernel 5.11 (Ubuntu 20.04 LTS) doesn't allow the modules to load #232

Closed gvarlet closed 1 year ago

gvarlet commented 2 years ago

After doing a complete check and redone of what I have done 3 times, it looks like kernel 5.11 doesn't allow the modules to load :

dua@Ubu2004:~/mdisProject$ sudo modprobe men_lx_z25 "mode=se,se"
modprobe: ERROR: could not insert 'men_lx_z25': Exec format error
dua@Ubu2004:~/mdisProject$
[ 2495.114960] men_oss: loading out-of-tree module taints kernel.
[ 2495.115083] men_oss: module verification failed: signature and/or required key missing - tainting kernel
[ 2495.115248] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 0000000068d9b7af, val ffffffffc0d8489b
[ 2530.101234] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000e5b13139, val ffffffffc0d9389b
[ 2846.779209] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000b1de9c08, val ffffffffc0db689b
[ 2856.900880] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000cbde4111, val ffffffffc0e0089b
dua@Ubu2004:~/mdisProject$ uname -a
Linux Ubu2004 5.11.0-41-generic #45~20.04.1-Ubuntu SMP Wed Nov 10 10:20:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
dua@Ubu2004:~/mdisProject$
dua@Ubu2004:~/mdisProject$
dua@Ubu2004:~/mdisProject$ modinfo men_lx_z25
filename:       /lib/modules/5.11.0-41-generic/misc/men_lx_z25.ko
version:        13Z025-90_01_19-0-gfe4c5bc_2021-01-28
author:         Thomas Schnuerer <thomas.schnuerer@men.de>
description:    MEN Z25/125 UART Stub driver for serial.c
license:        GPL
srcversion:     DAB1827DC24FD2CB640CF0F
depends:        men_lx_chameleon
retpoline:      Y
name:           men_lx_z25
vermagic:       5.11.0-41-generic SMP mod_unload modversions
parm:           mode:phys. mode for each port e.g.: mode="se df_fdx df_hdxe" (charp)
parm:           baud_base:Base for baudrate generation. Overriden by baud_bases (ulong)
parm:           baud_bases:Base for baudrate generation for each port e.g.: baud_bases=1843200,1843200,1041666,1041666. Overrides baud_base (array of ulong)
parm:           fixed_type:UART port fixed_type=0 (autoscan)/fixed_type=1 (PORT_16550A) (charp)
dua@Ubu2004:~/mdisProject$
gvarlet commented 2 years ago

I have done the exact same procedure with kernel 5.08 and it works.

dpfeuffer commented 2 years ago

Please reproduce/investigate this problem. 13MD05-90_02_05 shall be usable with 5.11/5.12.

gvarlet commented 2 years ago

For info : Linux dua-MEN-F026L00 5.11.0-38-generic #42 allows to load drivers also.

GonzaloMartinR commented 2 years ago

Hi, I have been able to reproduce the error:

image

image

I am investigating possible solutions to this one. It seems that the kernel modules are not being loaded because they are not signed. The first solution i tried is to disable the secure boot on the Bios but also using the mokutil tool: sudo mokutil --disable-validation

1

Another of the posible solutions was to change the .config file and change the values of: CONFIG_MODULE_SIG=y CONFIG_MODULE_SIG_ALL=y to: CONFIG_MODULE_SIG=n CONFIG_MODULE_SIG_ALL=n I did this also but seems to have no efect and the modules load keeps failing. The better solution seems to be signing the modules wich is the aproach I am taking now.

GonzaloMartinR commented 2 years ago

We have loaded the 5.11.0 mainline kernel and it loads the modules correctly. Now, knowing this we are comparing the differences between this version and the failed versions to find the root cause of this error.

mad-jrodriguez commented 1 year ago

Dear all,

After some days investigating what is the cause of the error, I have found a solution. First of all and after talk with @mad-jsanjuan, we have determined that this issue is strictly related with #246. It is also reproducible not only by mdis modules but all modules compiled against kernel 5.11.0-41 headers and so on (it is reproducible also with latest kernel of ubuntu 22.04).

Additionally, this bug is also reproducible using VMs.

For this example, @mad-jsanjuan and I have used a simple kernel module example that provides 2 basic kernel modules. You can find them here: https://github.com/dwmkerr/linux-kernel-module.

In a summarized way, The issue is caused by the MDIS build system, that is executing the kernel headers' Makefile, modifying the .config. It seems like in the kernel versions (and in newer versions) several kernel configuration entries are different because, after the execution of the MDIS build system, we get different configurations within .config.

Take a look:

--- .config.old 2021-11-10 10:56:15.000000000 +0100
+++ .config 2022-11-23 11:42:51.603885663 +0100
@@ -1,10 +1,10 @@
 #
 # Automatically generated file; DO NOT EDIT.
-# Linux/x86 5.11.0-41-generic Kernel Configuration
+# Linux/x86 5.11.22 Kernel Configuration
 #
-CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
+CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
 CONFIG_CC_IS_GCC=y
-CONFIG_GCC_VERSION=90300
+CONFIG_GCC_VERSION=90400
 CONFIG_LD_VERSION=234000000
 CONFIG_CLANG_VERSION=0
 CONFIG_LLD_VERSION=0
@@ -4783,7 +4783,6 @@
 #
 # PCI GPIO expanders
 #
-CONFIG_GPIO_AAEON=m
 CONFIG_GPIO_AMD8111=m
 CONFIG_GPIO_ML_IOH=m
 CONFIG_GPIO_PCI_IDIO_16=m
@@ -4930,7 +4929,6 @@
 #
 # Native drivers
 #
-CONFIG_SENSORS_AAEON=m
 CONFIG_SENSORS_ABITUGURU=m
 CONFIG_SENSORS_ABITUGURU3=m
 CONFIG_SENSORS_AD7314=m
@@ -5255,7 +5253,6 @@
 CONFIG_INTEL_MEI_WDT=m
 CONFIG_NI903X_WDT=m
 CONFIG_NIC7018_WDT=m
-CONFIG_AAEON_IWMI_WDT=m
 CONFIG_MEN_A21_WDT=m
 CONFIG_XEN_WDT=m

@@ -5420,7 +5417,6 @@
 CONFIG_MFD_WM8350_I2C=y
 CONFIG_MFD_WM8994=m
 CONFIG_MFD_WCD934X=m
-CONFIG_MFD_AAEON=m
 CONFIG_RAVE_SP_CORE=m
 CONFIG_MFD_INTEL_M10_BMC=m
 # end of Multifunction device drivers
@@ -7925,7 +7921,6 @@
 # LED drivers
 #
 CONFIG_LEDS_88PM860X=m
-CONFIG_LEDS_AAEON=m
 CONFIG_LEDS_APU=m
 CONFIG_LEDS_AS3645A=m
 CONFIG_LEDS_LM3530=m
@@ -9769,7 +9764,6 @@
 #
 # Ubuntu Supplied Third-Party Device Drivers
 #
-CONFIG_UBUNTU_ODM_DRIVERS=y
 CONFIG_HIO=m
 CONFIG_UBUNTU_HOST=m
 # end of Ubuntu Supplied Third-Party Device Drivers
@@ -10742,8 +10736,6 @@
 # CONFIG_DEBUG_INFO_SPLIT is not set
 CONFIG_DEBUG_INFO_DWARF4=y
 CONFIG_DEBUG_INFO_BTF=y
-CONFIG_PAHOLE_HAS_SPLIT_BTF=y
-CONFIG_DEBUG_INFO_BTF_MODULES=y
 CONFIG_GDB_SCRIPTS=y
 CONFIG_FRAME_WARN=1024
 # CONFIG_STRIP_ASM_SYMS is not set

To be concrete, the line that is causing the issue is:

-CONFIG_DEBUG_INFO_BTF_MODULES=y

So, the problem here is that the kernel and our modules are built with different kernel configuration, for this reason, neither our MDIS modules nor other kind of module compiled after .config changed, are not able to be loaded, reporting always the same error.

In older kernel versions (ie: kernel 5.4.0-132) the .config file does not change so the kernel and the modules are built with the same kernel configuration.

Steps to reproduce:

  1. Remove previous (and "cursed") kernel 5.11.0-41 hearders.
sudo apt-get purge linux-headers-5.11.0-41
  1. Install them again.
sudo apt-get install linux-headers-5.11.0-41

At this point, we have a clean kernel headers for the same version we are running.

Then, we clone & compile one of the kernels of the example we have mentioned above. For example, babel module.

cd babel && make
  1. Load module using insmod (as the module is not installed in the system)
sudo insmod babel.ko

At this point, we can check that the module is properly loaded:

[  234.966269] babel: loading out-of-tree module taints kernel.
[  234.966347] babel: module verification failed: signature and/or required key missing - tainting kernel
[  234.966851] babel: module loaded at 0x00000000685fc906
[  234.966860] babel: registered correctly with major number 236
[  234.966900] babel: device class registered correctly
[  234.967045] babel: device class created correctly

Note that at this point, the kernel throws the same lines mentioned in previous comments, complaining about loading kernels out-of-tree.

  1. Go inside our MDIS project and try to compile the MDIS drivers.
sudo make clean && sudo make && sudo make install

As you can see below, the kernel's Makefile is executed (that is actually the part where everything goes wrong)

men@men-MEN-F026L00:~/MDIS/13MD05-90$ sudo make
Getting Compiler/Linker settings from Linux Kernel Makefile
  SYNC    include/config/auto.conf.cmd
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
Cleaning .kernelsubdirs

++++++++ Preparing non-debug version of module men_mdis_kernel +++++++++++
Directory OBJ/nodbg/men_mdis_kernel created
....

Once the MDIS drivers are compiled & installed, we reboot the test setup to get a "fresh" system.

  1. Come back to kernel module example's folder and recompile again the same module.
make clean && make
  1. Try to load again the module using insmod. At the point, we get the same error reported in this issue.
men@men-MEN-F026L00:~/software/linux-kernel-module/babel$ sudo insmod babel.ko 
insmod: ERROR: could not insert module babel.ko: Invalid module format

The dmesg output:

men@men-MEN-F026L00:~/software/linux-kernel-module/babel$ dmesg
....
[ 4437.919492] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000008aa8b00b, val ffffffffc0a7d270

As you can see, we get the same error in dmesg.

For fixing that, first of all, we have reinstalled again (as in steps 1 and 2) the kernel headers to get a "fresh" kernel headers. then, we have removed the line that actually includes the kernel's header Makefile in the kernelsettings.mak file (in the installed Makefile, located in /opt/menlinux/)

--- a/MDISforLinux/BUILD/MDIS/TPL/kernelsettings.mak
+++ b/MDISforLinux/BUILD/MDIS/TPL/kernelsettings.mak
@@ -28,8 +28,6 @@
 # Free Software Foundation;  either version 2 of the  License, or (at your
 # option) any later version.

-include Makefile
-
 KERNEL_SETTINGS_FILE ?= /dev/null

 .DEFAULT_GOAL := getsettings_for_mdis

After that, we recompile again the MDIS modules, paying attention that the kernel's Makefile is not actually invoked.

sudo make clean && sudo make && sudo make install

During the compilation we get some messages, notifying that BTF won't be generated

  LD [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_chameleon_pcitbl/men_bb_chameleon_pcitbl.ko
  BTF [M] /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_chameleon_pcitbl/men_bb_chameleon_pcitbl.ko
Skipping BTF generation for /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_chameleon_pcitbl/men_bb_chameleon_pcitbl.ko due to unavailability of vmlinux
  CC [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203/men_bb_d203.mod.o
  LD [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203/men_bb_d203.ko
  BTF [M] /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203/men_bb_d203.ko
Skipping BTF generation for /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203/men_bb_d203.ko due to unavailability of vmlinux
  CC [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203_a24/men_bb_d203_a24.mod.o
  LD [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203_a24/men_bb_d203_a24.ko
  BTF [M] /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203_a24/men_bb_d203_a24.ko
Skipping BTF generation for /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_d203_a24/men_bb_d203_a24.ko due to unavailability of vmlinux
  CC [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_smb2/men_bb_smb2.mod.o
  LD [M]  /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_smb2/men_bb_smb2.ko
  BTF [M] /home/men/MDIS/13MD05-90/OBJ/nodbg/men_bb_smb2/men_bb_smb2.ko

After install the modules, we reboot the test setup and then, once it is booted again, we try to load the men_lx_z25 module and in works.

men@men-MEN-F026L00:~$ sudo modprobe men_lx_z25
[sudo] password for men: 
men@men-MEN-F026L00:~$ 

And the dmesg log:

[   82.119492] men_oss: loading out-of-tree module taints kernel.
[   82.119637] men_oss: module verification failed: signature and/or required key missing - tainting kernel
[   82.122075] MEN men_oss init_module
[   82.174991] MEN men_chameleon init_module
[   82.300771] MEN men_chameleon_io init_module
[   82.406422] Init MEN Chameleon PNP subsystem

We are working in a solution that may fix both issues as they are caused by the same Makefile.

mad-jrodriguez commented 1 year ago

The functional tests in test setup 1 have passed OK (except one test that depends on a private module that is not configured)

<testsuites disabled="0" errors="0" failures="1" tests="13" time="0.0">