cisco / exanic-software

ExaNIC drivers, utilities and development libraries
Other
144 stars 82 forks source link

Kernel module fails to build on Linux 6.1 LTS #70

Closed vient closed 10 months ago

vient commented 1 year ago

pci_set_dma_mask and pci_set_consistent_dma_mask were deprecated and later removed in Linux 5.18.

DKMS make.log for exanic-2.7.3.2-git for kernel 6.1.12-060112-generic (x86_64)
Sat Feb 18 15:52:39 MSK 2023
make: Entering directory '/var/lib/dkms/exanic/2.7.3.2-git/build/modules'
make -C /lib/modules/6.1.12-060112-generic/build M=$PWD modules
make[1]: Entering directory '/usr/src/linux-headers-6.1.12-060112-generic'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: x86_64-linux-gnu-gcc-9 (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
  You are using:           gcc-9 (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
  CC [M]  /var/lib/dkms/exanic/2.7.3.2-git/build/modules/exanic/exanic-main.o
/var/lib/dkms/exanic/2.7.3.2-git/build/modules/exanic/exanic-main.c: In function ‘exanic_probe’:
/var/lib/dkms/exanic/2.7.3.2-git/build/modules/exanic/exanic-main.c:1159:11: error: implicit declaration of function ‘pci_set_dma_mask’ [-Werror=implicit-function-declaration]
 1159 |     err = pci_set_dma_mask(pdev, DMA_BIT_MASK(exanic->dma_addr_bits));
      |           ^~~~~~~~~~~~~~~~
/var/lib/dkms/exanic/2.7.3.2-git/build/modules/exanic/exanic-main.c:1166:11: error: implicit declaration of function ‘pci_set_consistent_dma_mask’ [-Werror=implicit-function-declaration]
 1166 |     err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(exanic->dma_addr_bits));
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:250: /var/lib/dkms/exanic/2.7.3.2-git/build/modules/exanic/exanic-main.o] Error 1
make[2]: *** [scripts/Makefile.build:500: /var/lib/dkms/exanic/2.7.3.2-git/build/modules/exanic] Error 2
make[1]: *** [Makefile:2011: /var/lib/dkms/exanic/2.7.3.2-git/build/modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-6.1.12-060112-generic'
make: *** [Makefile:9: default] Error 2
make: Leaving directory '/var/lib/dkms/exanic/2.7.3.2-git/build/modules'
georgmu commented 1 year ago

I added compile fixes with #71. Only trivial fixes due to kernel cleanups.

slug302 commented 1 year ago

I am using Rocky Linux 9. uname -a Linux dev1 6.1.29-1.el9.elrepo.x86_64 I am compiling using your modified linux-6.1-fixes.

su - root
cd exanic-software
git checkout linux-6.1-fixes
make -j32
make install
modprobe exanic

Whenever I run exanic-config xxx, the machine restarts. exanic-config exanic0 Has anyone encountered a similar issue and can help me?

georgmu commented 1 year ago

I tested with 6.1.15 on debian and there were no problems... Do you see any OOPS (maybe when running modprobe from a tty) or can you add a serial console to log kernel output?

slug302 commented 1 year ago

thank you for your information. I'll give it another try.

ech68 commented 1 year ago

I've seen this crash on exanic-config as well - but only if I enable iommu options on the kernel command line, specifically in my case, "iommu.passthrough=0 intel_iommu=on"

Crash looks like:

[  341.635741] exanic: ExaNIC network driver (ver 2.7.3) loaded.
[  341.641492] exanic 0000:d8:00.0: Probing exanic0.
[  341.647009] exanic 0000:d8:00.0: can't disable ASPM; OS doesn't have ASPM control
[  341.652849] exanic 0000:d8:00.0: Registers at phys: 0x0x00000000ef000000, virt: 0x000000008335bc08, size: 16777216 bytes.
[  341.658822] exanic 0000:d8:00.0: DMA address width: 64 bits.
[  341.664828] exanic 0000:d8:00.0: TX region at phys: 0x0x00000000f0000000, size: 4194304 bytes.
[  341.672242] exanic 0000:d8:00.0: TX feedback region at virt: 0x00000000a8806eed, dma handle: 0x0x00000000fffff000, size: 4096 bytes.
[  341.696327] exanic 0000:d8:00.0: MAC address: 64:3f:5f:01:c1:4a
[  341.703032] exanic 0000:d8:00.0: Filters at phys: 0x0x00000000ef004000, size: 16760832 bytes.
[  341.710007] exanic 0000:d8:00.0 eth4: ExaNIC ethernet interface exanic0:0, hwaddr 64:3f:5f:01:c1:4a
[  341.717180] exanic 0000:d8:00.0 eth4: Link is down
[  341.720028] exanic 0000:d8:00.0 enp216s0: renamed from eth4
[  341.756512] exanic 0000:d8:00.0 eth4: ExaNIC ethernet interface exanic0:1, hwaddr 64:3f:5f:01:c1:4b
[  341.763607] exanic 0000:d8:00.0 eth4: Link is down
[  341.771308] pps pps3: new PPS source ptp4
[  341.778999] exanic 0000:d8:00.0: PTP hardware clock registered (ptp4)
[  341.786144] exanic 0000:d8:00.0: Resetting PTP hardware clock
[  341.793355] exanic 0000:d8:00.0: Finished probing exanic0 (minor = 120):
[  341.800016] BUG: unable to handle page fault for address: ffffd43a2800eec8
[  341.800629] exanic 0000:d8:00.0:   ExaNIC interface version = 1
[  341.807662] #PF: supervisor read access in kernel mode
[  341.807664] #PF: error_code(0x0000) - not-present page
[  341.807665] PGD 0 P4D 0 
[  341.807667] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  341.807669] CPU: 13 PID: 14968 Comm: exanic-config Kdump: loaded Tainted: G           OE      6.1.38-1.el8.x86_64 #1
[  341.814889] exanic 0000:d8:00.0:   Hardware ID = ExaNIC X25
[  341.816150] exanic 0000:d8:00.0 enp216s0d1: renamed from eth4
[  341.822119] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.18.1 02/22/2023
[  341.822120] RIP: 0010:_compound_head+0x0/0x40
[  341.822128] Code: 8b 78 08 e8 02 e8 fe ff 65 ff 0d eb d9 b0 6a 74 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 <48> 8b 47 08 a8 01 75 24 66 90 48 89 f8 c3 cc cc cc cc f7 c7 ff 0f
[  341.829420] exanic 0000:d8:00.0:   Function = network interface
[  341.836566] RSP: 0018:ffffb06c6c68fc58 EFLAGS: 00010206
[  341.836568] RAX: 0000000000000000 RBX: ffff8d53c2339c78 RCX: 0000000000001000
[  341.836569] RDX: ffffd43a2800eec0 RSI: 00007ffff7ff3000 RDI: ffffd43a2800eec0
[  341.836570] RBP: 00007ffff7ff3000 R08: 0000000000000000 R09: 0000000000000000
[  341.836571] R10: ffff8d72bb819ba0 R11: ffff8d54c656d980 R12: ffffd43a2800eec0
[  341.836572] R13: ffff8d53c2339c78 R14: ffff8d5345dc0000 R15: 00000000000000fb
[  341.836573] FS:  00007ffff7fe6b80(0000) GS:ffff8d72bfd80000(0000) knlGS:0000000000000000
[  341.836574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  341.971038] CR2: ffffd43a2800eec8 CR3: 0000000286eea006 CR4: 00000000007706e0
[  341.979020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  341.987255] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  341.995588] PKRU: 55555554
[  342.003419] Call Trace:
[  342.011365]  <TASK>
[  342.019130]  ? __die_body+0x1a/0x60
[  342.026552]  ? page_fault_oops+0x136/0x2a0
[  342.034148]  ? fixup_exception+0x22/0x340
[  342.042118]  ? exc_page_fault+0x138/0x140
[  342.049311]  ? asm_exc_page_fault+0x22/0x30
[  342.056277]  ? trace_rss_stat+0x60/0x60
[  342.063179]  vm_insert_page+0x45/0x150
[  342.069963]  exanic_mmap+0x3a6/0x580 [exanic]
[  342.076537]  mmap_region+0x245/0xc30
[  342.083209]  ? arch_get_unmapped_area_topdown+0xfc/0x240
[  342.090310]  do_mmap+0x382/0x580
[  342.096866]  vm_mmap_pgoff+0xd9/0x180
[  342.103300]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  342.109658]  ksys_mmap_pgoff+0x189/0x1e0
[  342.115944]  do_syscall_64+0x58/0x80
[  342.122148]  ? exc_page_fault+0x64/0x140
[  342.128415]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  342.134873] RIP: 0033:0x7ffff7129957
[  342.141591] Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74 52 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 79 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f
[  342.154692] RSP: 002b:00007fffffffdf28 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[  342.161163] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007ffff7129957
[  342.167873] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000000000000
[  342.174416] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000100000
[  342.180952] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000003
[  342.187967] R13: 0000000000001000 R14: 0000000000000001 R15: 0000000000000005
[  342.194522]  </TASK>
vient commented 1 year ago

I have the same crash on AMD system even if I pass amd_iommu=off in kernel parameters.

vient commented 10 months ago

guess the compilation is fixed in 2.7.4