AltraMayor / gatekeeper

The first open-source DDoS protection system
https://github.com/AltraMayor/gatekeeper/wiki
GNU General Public License v3.0
1.31k stars 227 forks source link

1.2dev kni_mod 20.11 When gatekeeper is not gracefully shut down in i40e driver scenario, BUG appears: kernel NULL pointer dereference, address: 0000000000000010 #685

Closed ShawnLeung87 closed 3 months ago

ShawnLeung87 commented 5 months ago

When this exception occurs, the server must be restarted before it can be restored. Exception log:

Apr 13 16:31:39 78-GK-3 systemd[1]: systemd-timedated.service: Succeeded. Apr 13 16:31:57 78-GK-3 systemd-networkd[1949]: kni_back: Link DOWN Apr 13 16:31:57 78-GK-3 systemd-networkd[1949]: kni_back: Lost carrier Apr 13 16:31:57 78-GK-3 kernel: [ 369.048101] BUG: kernel NULL pointer dereference, address: 0000000000000010 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048152] #PF: supervisor read access in kernel mode Apr 13 16:31:57 78-GK-3 kernel: [ 369.048177] #PF: error_code(0x0000) - not-present page Apr 13 16:31:57 78-GK-3 kernel: [ 369.048201] PGD 0 P4D 0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048219] Oops: 0000 [#1] SMP NOPTI Apr 13 16:31:57 78-GK-3 kernel: [ 369.048243] CPU: 37 PID: 2496 Comm: rte_mp_handle Tainted: G OE 5.15.0-97-generic #107~20.04.1-Ubuntu Apr 13 16:31:57 78-GK-3 kernel: [ 369.048291] Hardware name: Dell Inc. PowerEdge R740xd/06WXJT, BIOS 2.11.2 004/21/2021 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048323] RIP: 0010:vmacache_find+0x24/0xf0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048355] Code: 5d c3 cc cc cc cc 0f 1f 44 00 00 65 48 8b 04 25 c0 fb 01 00 48 3b b8 08 09 00 00 74 07 31 c0 c3 cc cc cc cc f6 40 2e 20 75 f3 <48> 8b 57 10 48 3b 90 18 09 00 00 75 69 55 48 89 e5 41 57 49 89 f7 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048428] RSP: 0018:ffffa5b6a1e979c8 EFLAGS: 00010246 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048455] RAX: ffff9077dbc98000 RBX: 0000000000000000 RCX: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048486] RDX: 0000000000000001 RSI: 0000002d42377000 RDI: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048517] RBP: ffffa5b6a1e979e8 R08: ffffa5b6a1e97b80 R09: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048547] R10: 0000000000000001 R11: ffff906ac32b80c0 R12: 0000002d42377000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048578] R13: 0000002d42377000 R14: 0000000000000000 R15: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048607] FS: 0000000000000000(0000) GS:ffff90b6bf680000(0000) knlGS:0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048685] CR2: 0000000000000010 CR3: 000000524d810004 CR4: 00000000007706e0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048716] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048745] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048775] PKRU: 55555554 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048790] Call Trace: Apr 13 16:31:57 78-GK-3 kernel: [ 369.048805] Apr 13 16:31:57 78-GK-3 kernel: [ 369.048820] ? show_regs.cold+0x1a/0x1f Apr 13 16:31:57 78-GK-3 kernel: [ 369.048848] ? die_body+0x20/0x70 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048877] ? die+0x2b/0x37 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048895] ? page_fault_oops+0x136/0x2c0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048918] ? update_load_avg+0x7c/0x650 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048942] ? newidle_balance+0x39d/0x470 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048967] ? do_user_addr_fault+0x303/0x660 Apr 13 16:31:57 78-GK-3 kernel: [ 369.048990] ? update_idle_core+0xe5/0x120 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049015] ? exc_page_fault+0x77/0x170 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049042] ? asm_exc_page_fault+0x27/0x30 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049067] ? vmacache_find+0x24/0xf0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049091] ? find_vma+0x1b/0x80 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049114] find_extend_vma+0x1e/0x90 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049140] get_user_pages+0xa0/0x6b0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049167] get_user_pages_remote+0xdc/0x320 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049191] ? kfree+0x3bd/0x420 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049214] get_user_pages_remote+0x21/0x50 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049239] kni_fifo_trans_pa2va+0x1fa/0x310 [rte_kni] Apr 13 16:31:57 78-GK-3 kernel: [ 369.049271] ? kobject_release+0x5f/0x150 Apr 13 16:31:57 78-GK-3 kernel: [ 369.049299] kni_net_release_fifo_phy+0x36/0x40 [rte_kni] Apr 13 16:31:57 78-GK-3 kernel: [ 369.049329] kni_dev_remove+0x33/0x50 [rte_kni] Apr 13 16:31:57 78-GK-3 kernel: [ 369.050138] kni_release+0xb0/0x180 [rte_kni] Apr 13 16:31:57 78-GK-3 kernel: [ 369.050908] fput+0x9c/0x280 Apr 13 16:31:57 78-GK-3 kernel: [ 369.051672] __fput+0xe/0x20 Apr 13 16:31:57 78-GK-3 kernel: [ 369.052412] task_work_run+0x6d/0xb0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.053133] do_exit+0x363/0xad0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.053850] ? mod_memcg_lruvec_state+0x63/0xe0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.054572] do_group_exit+0x43/0xb0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.055289] get_signal+0x157/0x900 Apr 13 16:31:57 78-GK-3 kernel: [ 369.055991] ? lru_cache_add_inactive_or_unevictable+0x29/0xe0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.056686] arch_do_signal_or_restart+0xf7/0x290 Apr 13 16:31:57 78-GK-3 kernel: [ 369.057369] ? fput+0x13/0x20 Apr 13 16:31:57 78-GK-3 kernel: [ 369.058039] ? __sys_recvmsg+0x98/0xb0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.058667] exit_to_user_mode_prepare+0x130/0x1c0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.059282] syscall_exit_to_user_mode+0x27/0x50 Apr 13 16:31:57 78-GK-3 kernel: [ 369.059893] ? __x64_sys_recvmsg+0x1f/0x30 Apr 13 16:31:57 78-GK-3 kernel: [ 369.060200] do_syscall_64+0x69/0xc0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.060463] ? exit_to_user_mode_prepare+0x92/0x1c0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.060726] ? do_user_addr_fault+0x1e0/0x660 Apr 13 16:31:57 78-GK-3 kernel: [ 369.060987] ? irqentry_exit_to_user_mode+0x17/0x20 Apr 13 16:31:57 78-GK-3 kernel: [ 369.061247] ? irqentry_exit+0x1d/0x30 Apr 13 16:31:57 78-GK-3 kernel: [ 369.061498] ? exc_page_fault+0x89/0x170 Apr 13 16:31:57 78-GK-3 kernel: [ 369.061747] entry_SYSCALL_64_after_hwframe+0x62/0xcc Apr 13 16:31:57 78-GK-3 kernel: [ 369.061996] RIP: 0033:0x7f4b7f1dd0ed Apr 13 16:31:57 78-GK-3 kernel: [ 369.062244] Code: Unable to access opcode bytes at RIP 0x7f4b7f1dd0c3. Apr 13 16:31:57 78-GK-3 kernel: [ 369.062494] RSP: 002b:00007f4b7dec9fb0 EFLAGS: 00000293 ORIG_RAX: 000000000000002f Apr 13 16:31:57 78-GK-3 kernel: [ 369.062753] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f4b7f1dd0ed Apr 13 16:31:57 78-GK-3 kernel: [ 369.063017] RDX: 0000000000000000 RSI: 00007f4b7deca030 RDI: 0000000000000009 Apr 13 16:31:57 78-GK-3 kernel: [ 369.063285] RBP: 00007f4b7deca254 R08: 0000000000000000 R09: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.063554] R10: 0000000000004022 R11: 0000000000000293 R12: 00007f4b7deca038 Apr 13 16:31:57 78-GK-3 kernel: [ 369.063826] R13: 00007f4b7deca072 R14: 00007f4b7deca250 R15: 00007f4b7deca030 Apr 13 16:31:57 78-GK-3 kernel: [ 369.064101] Apr 13 16:31:57 78-GK-3 kernel: [ 369.064374] Modules linked in: rte_kni(OE) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif binfmt_misc input_leds joydev kvm dell_smbios ice dcdbas rapl wmi_bmof dell_wmi_descriptor ib_uverbs ib_core intel_cstate mei_me ioatdma mei intel_pch_thermal dca acpi_ipmi ipmi_si ipmi_devintf ip6t_REJECT nf_reject_ipv6 ipmi_msghandler xt_hl ip6t_rt acpi_power_meter mac_hid ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_addrtype sch_fq_codel xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 iavf nf_defrag_ipv4 uio_pci_generic ip6table_filter uio ip6_tables iptable_filter bpfilter msr ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_mirror Apr 13 16:31:57 78-GK-3 kernel: [ 369.064448] dm_region_hash dm_log hid_generic usbhid hid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crct10dif_pclmul rc_core crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd i40e(OE) cryptd i2c_i801 ahci drm megaraid_sas tg3 xhci_pci lpc_ich i2c_smbus libahci xhci_pci_renesas wmi Apr 13 16:31:57 78-GK-3 kernel: [ 369.068303] CR2: 0000000000000010 Apr 13 16:31:57 78-GK-3 kernel: [ 369.068702] ---[ end trace 03c229dce7104af2 ]--- Apr 13 16:31:57 78-GK-3 kernel: [ 369.114550] RIP: 0010:vmacache_find+0x24/0xf0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.114852] Code: 5d c3 cc cc cc cc 0f 1f 44 00 00 65 48 8b 04 25 c0 fb 01 00 48 3b b8 08 09 00 00 74 07 31 c0 c3 cc cc cc cc f6 40 2e 20 75 f3 <48> 8b 57 10 48 3b 90 18 09 00 00 75 69 55 48 89 e5 41 57 49 89 f7 Apr 13 16:31:57 78-GK-3 kernel: [ 369.115462] RSP: 0018:ffffa5b6a1e979c8 EFLAGS: 00010246 Apr 13 16:31:57 78-GK-3 kernel: [ 369.115772] RAX: ffff9077dbc98000 RBX: 0000000000000000 RCX: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.116042] RDX: 0000000000000001 RSI: 0000002d42377000 RDI: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.116308] RBP: ffffa5b6a1e979e8 R08: ffffa5b6a1e97b80 R09: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.116575] R10: 0000000000000001 R11: ffff906ac32b80c0 R12: 0000002d42377000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.116845] R13: 0000002d42377000 R14: 0000000000000000 R15: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.117114] FS: 0000000000000000(0000) GS:ffff90b6bf680000(0000) knlGS:0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.117386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 13 16:31:57 78-GK-3 kernel: [ 369.117647] CR2: 0000000000000010 CR3: 000000524d810004 CR4: 00000000007706e0 Apr 13 16:31:57 78-GK-3 kernel: [ 369.117910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 13 16:31:57 78-GK-3 kernel: [ 369.118172] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Apr 13 16:31:57 78-GK-3 kernel: [ 369.118433] PKRU: 55555554 Apr 13 16:31:57 78-GK-3 kernel: [ 369.118693] Fixing recursive fault but reboot is needed!

ShawnLeung87 commented 5 months ago

The same code is normal in the ixgb 10g scenario and can be kill -9 gatekeeper.

AltraMayor commented 5 months ago

The root cause of this problem is the KNI kernel module, and the latest v1.2.0-dev solves this problem by replacing the KNI kernel module with the Virtio kernel module.

ShawnLeung87 commented 5 months ago

xl710 After using virtio-user, PCTYPES is not supported in rss.

AltraMayor commented 5 months ago

I can only look at that problem once I address a problem with bonded interfaces I'm working on; this should take me a couple more weeks. In the meantime, you should collect as much information about your problem as possible since I don't have access to an xl710 to test.

ShawnLeung87 commented 5 months ago

I'm also trying to set breakpoints and use gdb to collect specific information. If you need to test the XL710, we can provide a test environment.

AltraMayor commented 4 months ago

Could you give me access to a machine with an XL710 ready for testing with branch v1.2.0-dev?

AltraMayor commented 3 months ago

Pull request #691 addressed this issue.