NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
326 stars 66 forks source link

DKMS build errors on Fedora 35 5.17.4-200 #379

Closed DasSkelett closed 2 years ago

DasSkelett commented 2 years ago

I started seeing the following build error for the kernel module on Fedora 35 5.17.4-200, which has been released a week ago as an update to 5.16.20:

DKMS make.log for jool-4.1.8 for kernel 5.17.4-200.fc35.x86_64 (x86_64)
Thu Apr 28 01:36:33 AM CEST 2022
make: Entering directory '/usr/src/kernels/5.17.4-200.fc35.x86_64'
CC [M]  /var/lib/dkms/jool/4.1.8/build/src/mod/common/rfc7915/4to6.o
/var/lib/dkms/jool/4.1.8/build/src/mod/common/rfc7915/4to6.c: In function ‘allocate_fast’:
/var/lib/dkms/jool/4.1.8/build/src/mod/common/rfc7915/4to6.c:407:9: error: implicit declaration of function ‘nf_reset’ [-Werror=implicit-function-declaration]
407 |         nf_reset(out);
|         ^~~~~~~~
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:288: /var/lib/dkms/jool/4.1.8/build/src/mod/common/rfc7915/4to6.o] Error 1
make: *** [Makefile:1841: /var/lib/dkms/jool/4.1.8/build/src/mod/common] Error 2
make: Leaving directory '/usr/src/kernels/5.17.4-200.fc35.x86_64'

My guess is that with the upgrade to 5.17 for F35 the changes got backported that changed nf_reset to nf_reset_ct and so these two conditionals need to be adjusted: https://github.com/NICMx/Jool/blob/6822bdee4ec63467e82d723a3381b3116c1853d9/src/mod/common/rfc7915/4to6.c#L404-L408 https://github.com/NICMx/Jool/blob/6822bdee4ec63467e82d723a3381b3116c1853d9/src/mod/common/rfc7915/6to4.c#L470-L474

After writing the above I manually removed the preprocessor makro and ran it again, this time it failed at:

/var/lib/dkms/jool/4.1.8/build/src/mod/common/skbuff.c: In function ‘print_shinfo_fields’:
/var/lib/dkms/jool/4.1.8/build/src/mod/common/skbuff.c:472:50: error: ‘skb_frag_t’ {aka ‘struct bio_vec’} has no member named ‘page_offset’; did you mean ‘bv_offset’?
472 |                                 shinfo->frags[f].page_offset,
|                                                  ^~~~~~~~~~~
./include/linux/printk.h:418:33: note: in definition of macro ‘printk_index_wrap’
418 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
|                                 ^~~~~~~~~~~
./include/linux/printk.h:531:9: note: in expansion of macro ‘printk’
531 |         printk(KERN_CONT fmt, ##__VA_ARGS__)
|         ^~~~~~
/var/lib/dkms/jool/4.1.8/build/src/mod/common/skbuff.c:38:17: note: in expansion of macro ‘pr_cont’
38 |                 pr_cont(text "\n", ##__VA_ARGS__); \
|                 ^~~~~~~
/var/lib/dkms/jool/4.1.8/build/src/mod/common/skbuff.c:468:17: note: in expansion of macro ‘print’
468 |                 print(tabs, "%u page_offset:%u size:%u", f,
|                 ^~~~~
make[1]: *** [scripts/Makefile.build:288: /var/lib/dkms/jool/4.1.8/build/src/mod/common/skbuff.o] Error 1
make: *** [Makefile:1841: /var/lib/dkms/jool/4.1.8/build/src/mod/common] Error 2

https://github.com/NICMx/Jool/blob/5dc6ae4fbc1620d3c6cefb8b98f054bb81c47a2a/src/mod/common/skbuff.c#L469-L473

and

/var/lib/dkms/jool/4.1.8/build/src/mod/common/error_pool.c:5:10: fatal error: stdarg.h: No such file or directory
5 | #include <stdarg.h>
|          ^~~~~~~~~~
compilation terminated.
make[1]: *** [scripts/Makefile.build:288: /var/lib/dkms/jool/4.1.8/build/src/mod/common/error_pool.o] Error 1
make: *** [Makefile:1841: /var/lib/dkms/jool/4.1.8/build/src/mod/common] Error 2

https://github.com/NICMx/Jool/blob/6822bdee4ec63467e82d723a3381b3116c1853d9/src/mod/common/error_pool.c#L4-L6

and

/var/lib/dkms/jool/4.1.8/build/src/mod/common/db/pool4/rfc6056.c: In function ‘rfc6056_f’:
/var/lib/dkms/jool/4.1.8/build/src/mod/common/db/pool4/rfc6056.c:170:13: error: ‘struct shash_desc’ has no member named ‘flags’
170 |         desc->flags = 0;
|             ^~
make[1]: *** [scripts/Makefile.build:288: /var/lib/dkms/jool/4.1.8/build/src/mod/common/db/pool4/rfc6056.o] Error 1
make: *** [Makefile:1841: /var/lib/dkms/jool/4.1.8/build/src/mod/common] Error 2

https://github.com/NICMx/Jool/blob/6822bdee4ec63467e82d723a3381b3116c1853d9/src/mod/common/db/pool4/rfc6056.c#L168-L171

Quick links in case they help:

DasSkelett commented 2 years ago

Not too sure how to approach this. If I'm not mistaken, it's not enough to change the ra and rb part in the conditionals, as both the 5.16 and 5.17 trees seem to have RHEL_MAJOR = 9 RHEL_MINOR = 99.

https://gitlab.com/cki-project/kernel-ark/-/blob/fedora-5.17/Makefile.rhelver#L1-L2 https://gitlab.com/cki-project/kernel-ark/-/blob/fedora-5.16/Makefile.rhelver#L1-L2

So I believe a second check on the actual kernel version needs to be added? Hoping that the kernels are indeed the same for Fedora 34/35/36.

ydahhrk commented 2 years ago

Sorry; I'm currently at LACNIC 37 and lack the equipment necessary to patch this.

I'll work on this on May 9.

ydahhrk commented 2 years ago

Hmmm. Don't know what's going on. I found a Fedora kernel that relies on RHEL_RELEASE_CODE, and one that doesn't. Yours probably doesn't either.

LINUX_VERSION_CODE: 332037   # aka. 0x51105, aka. "5.17.5"
RHEL_RELEASE_CODE: 2403      # aka. 0x963, aka. "9.99"

Try this for a quick workaround: Delete lines 14 through 21, and also 28. Recompile.

DasSkelett commented 2 years ago

Yup, I found this out like an hour ago as well. Apparently the 5.16 kernels didn't have all the RHEL_ variables defined, which made the logic handle them as stock kernels, which made DKMS work for those. If you scroll down in patch-5.17-redhat.patch (renamed from patch-5.16-redhat.patch) at https://src.fedoraproject.org/rpms/kernel/c/c8aa1cc125eba26a2b76af22bbcee9513b477a41 you can see them being added again through the patch file (don't get confused, that's a diff of a patch).

If I remove the RHEL_RELEASE_CODE conditional lines from linux_version.h it does indeed build cleanly.

So all that might be needed is really just updating the ra and rb values in the LINUX_VERSION_AT_LEAST checks, to 9, 99. This will fix it for the 5.17 kernels that have RHEL_RELEASE_CODE defined, and the 5.16 kernels that don't will still work because they just compare to the normal kernel version numbers which happen to be "reliable" here.