Atoptool / atop

System and process monitor for Linux
GNU General Public License v2.0
792 stars 109 forks source link

Netatop crashes the kernel with General Protection Fault #302

Closed ValdikSS closed 2 months ago

ValdikSS commented 4 months ago

Hello,

Netatop 3.1 module crashes my server once in several days, with General Protection fault. Take a look at the most recent crash log obtained with netconsole. It crashes in analyze_tcpv4_packet - sock2task - get_taskinfo.

The most recent crash (spoiler) ``` [206201.363307] general protection fault, probably for non-canonical address 0xbe27f590f0ab0657: 0000 [#1] PREEMPT SMP PTI [206201.363318] CPU: 0 PID: 310615 Comm: eiskaltdcpp-qt Tainted: G W OE 6.6.27-1-lts #1 d5b6011e73704a95088e0244d141560bd5ec914b [206201.363323] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018 [206201.363326] RIP: 0010:kmem_cache_alloc+0x115/0x370 [206201.363334] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00 [206201.363338] RSP: 0018:ffffb5340c80b700 EFLAGS: 00010092 [206201.363342] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04b3e8139ea [206201.363345] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f [206201.363349] RBP: ffffb5340c80b750 R08: be27f590f0ab061f R09: 00000000004bc400 [206201.363351] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00 [206201.363354] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078 [206201.363357] FS: 000073831ddfb6c0(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000 [206201.363361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [206201.363363] CR2: 000073831ddf7e78 CR3: 000000012c6aa001 CR4: 00000000001706f0 [206201.363366] Call Trace: [206201.363369] [206201.363372] ? die_addr+0x36/0x90 [206201.363377] ? exc_general_protection+0x1c5/0x430 [206201.363382] ? asm_exc_general_protection+0x26/0x30 [206201.363387] ? kmem_cache_alloc+0x115/0x370 [206201.363392] ? get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.363400] get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.363406] sock2task+0x1fe/0x380 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.363413] analyze_tcpv4_packet+0x1be/0x210 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.363420] ipv4_hookout+0xa5/0xe0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.363427] nf_hook_slow+0x45/0xc0 [206201.363433] __ip_local_out+0xfa/0x180 [206201.363438] ? __pfx_dst_output+0x10/0x10 [206201.363441] ip_local_out+0x1b/0x70 [206201.363445] __ip_queue_xmit+0x175/0x490 [206201.363448] __tcp_transmit_skb+0xa5e/0xbf0 [206201.363454] tcp_connect+0xb37/0xeb0 [206201.363458] ? __pfx___inet_check_established+0x10/0x10 [206201.363463] tcp_v4_connect+0x419/0x500 [206201.363467] __inet_stream_connect+0x112/0x3d0 [206201.363472] inet_stream_connect+0x3a/0x60 [206201.363476] __sys_connect+0xa8/0xd0 [206201.363480] __x64_sys_connect+0x18/0x20 [206201.363484] do_syscall_64+0x5a/0x80 [206201.363489] ? __slab_free+0xf1/0x380 [206201.363493] ? __unfreeze_partials+0x1c1/0x210 [206201.363497] ? __mod_memcg_lruvec_state+0x4e/0xa0 [206201.363501] ? skb_release_data+0x142/0x1c0 [206201.363506] ? rtl8169_poll+0x442/0x4e0 [r8169 84aff28b94f8fe3441c84c217bd59057a09d2ae4] [206201.363516] ? __napi_poll+0x2b/0x1b0 [206201.363519] ? net_rx_action+0x19e/0x370 [206201.363522] ? sched_clock+0x10/0x30 [206201.363525] ? sched_clock_cpu+0xf/0x190 [206201.363530] ? irqtime_account_irq+0x40/0xc0 [206201.363533] ? __do_softirq+0x186/0x2c8 [206201.363537] ? __irq_exit_rcu+0x4b/0xc0 [206201.363542] entry_SYSCALL_64_after_hwframe+0x78/0xe2 [206201.363546] RIP: 0033:0x738396d2879b [206201.363567] Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 fb cf f7 ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 08 e8 51 d0 f7 ff 8b 44 [206201.363571] RSP: 002b:000073831ddfab70 EFLAGS: 00000293 ORIG_RAX: 000000000000002a [206201.363575] RAX: ffffffffffffffda RBX: 0000738374058b58 RCX: 0000738396d2879b [206201.363578] RDX: 0000000000000010 RSI: 000073831ddfab90 RDI: 0000000000000016 [206201.363580] RBP: 000073837402e700 R08: 0000000000000000 R09: 0000000000000000 [206201.363583] R10: 0000738396d9efe0 R11: 0000000000000293 R12: 000073831ddfab90 [206201.363586] R13: 000073831ddfaba0 R14: 0000738374058b58 R15: 0000738397ca7b60 [206201.363589] [206201.363591] Modules linked in: tls bluetooth ecdh_generic mptcp_diag vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag netconsole nf_conntrack_netlink xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill iptable_mangle iptable_filter iptable_nat zram nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_intel mei_pxp irqbypass spi_nor mtd crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic at24 mei_hdcp [206201.363633] iTCO_wdt spi_intel_platform spi_intel intel_pmc_bxt gf128mul snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi snd_hda_codec ghash_clmulni_intel r8169 sha512_ssse3 sha256_ssse3 snd_hda_core sha1_ssse3 aesni_intel crypto_simd cryptd mxm_wmi snd_hwdep rapl intel_cstate realtek mdio_devres mei_me alx snd_pcm intel_uncore lpc_ich i2c_i801 libphy mei i2c_smbus snd_timer snd soundcore mdio mac_hid tcp_bbr netatop(OE) sg crypto_user loop fuse dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy ttm intel_gtt xhci_pci crc32c_intel drm_display_helper xhci_pci_renesas cec video wmi [206201.363755] ---[ end trace 0000000000000000 ]--- [206201.363758] RIP: 0010:kmem_cache_alloc+0x115/0x370 [206201.363763] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00 [206201.363767] RSP: 0018:ffffb5340c80b700 EFLAGS: 00010092 [206201.363770] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04b3e8139ea [206201.363773] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f [206201.363776] RBP: ffffb5340c80b750 R08: be27f590f0ab061f R09: 00000000004bc400 [206201.363779] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00 [206201.363782] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078 [206201.363785] FS: 000073831ddfb6c0(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000 [206201.363788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [206201.363791] CR2: 000073831ddf7e78 CR3: 000000012c6aa001 CR4: 00000000001706f0 [206201.363794] note: eiskaltdcpp-qt[310615] exited with irqs disabled [206201.363862] note: eiskaltdcpp-qt[310615] exited with preempt_count 2 [206201.415487] general protection fault, probably for non-canonical address 0xbe27f590f0ab0657: 0000 [#2] PREEMPT SMP PTI [206201.415497] CPU: 0 PID: 310603 Comm: snowflake Tainted: G D W OE 6.6.27-1-lts #1 d5b6011e73704a95088e0244d141560bd5ec914b [206201.415503] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018 [206201.415506] RIP: 0010:kmem_cache_alloc+0x115/0x370 [206201.415514] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00 [206201.415519] RSP: 0018:ffffb5340c85b8e0 EFLAGS: 00010092 [206201.415523] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04d97e322d0 [206201.415526] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f [206201.415529] RBP: ffffb5340c85b930 R08: be27f590f0ab061f R09: 00000000004bc400 [206201.415532] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00 [206201.415535] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078 [206201.415538] FS: 000000c000201490(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000 [206201.415542] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [206201.415602] CR2: 00007c086e930180 CR3: 0000000105504006 CR4: 00000000001706f0 [206201.415606] Call Trace: [206201.415609] [206201.415612] ? die_addr+0x36/0x90 [206201.415618] ? exc_general_protection+0x1c5/0x430 [206201.415623] ? asm_exc_general_protection+0x26/0x30 [206201.415628] ? kmem_cache_alloc+0x115/0x370 [206201.415634] ? get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.415642] get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.415649] sock2task+0x16b/0x380 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.415656] analyze_tcpv4_packet+0x1be/0x210 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.415663] ipv4_hookout+0xa5/0xe0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39] [206201.415669] nf_hook_slow+0x45/0xc0 [206201.415676] __ip_local_out+0xfa/0x180 [206201.415680] ? __pfx_dst_output+0x10/0x10 [206201.415684] ip_local_out+0x1b/0x70 [206201.415688] __ip_queue_xmit+0x175/0x490 [206201.415691] __tcp_transmit_skb+0xa5e/0xbf0 [206201.415697] tcp_v4_do_rcv+0x151/0x280 [206201.415701] __release_sock+0xb8/0xd0 [206201.415705] release_sock+0x2f/0x90 [206201.415709] tcp_recvmsg+0x92/0x1f0 [206201.415714] inet_recvmsg+0x56/0x130 [206201.415718] ? __pfx_bpf_lsm_socket_recvmsg+0x10/0x10 [206201.415722] ? security_socket_recvmsg+0x44/0x70 [206201.415726] sock_recvmsg+0xa6/0xd0 [206201.415730] sock_read_iter+0x96/0x100 [206201.415733] vfs_read+0x303/0x350 [206201.415738] ksys_read+0xbb/0xf0 [206201.415741] do_syscall_64+0x5a/0x80 [206201.415746] ? syscall_exit_to_user_mode+0x22/0x40 [206201.415750] ? do_syscall_64+0x66/0x80 [206201.415754] entry_SYSCALL_64_after_hwframe+0x78/0xe2 [206201.415758] RIP: 0033:0x40720e [206201.415780] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [206201.415785] RSP: 002b:000000c000373c28 EFLAGS: 00000206 ORIG_RAX: 0000000000000000 [206201.415789] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 000000000040720e [206201.415792] RDX: 0000000000008000 RSI: 000000c000164000 RDI: 0000000000000008 [206201.415795] RBP: 000000c000373c68 R08: 0000000000000000 R09: 0000000000000000 [206201.415797] R10: 0000000000000000 R11: 0000000000000206 R12: 000000c00051beb0 [206201.415800] R13: 000000c00023ef12 R14: 000000c0000076c0 R15: 0000000000000000 [206201.415804] [206201.415806] Modules linked in: tls bluetooth ecdh_generic mptcp_diag vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag netconsole nf_conntrack_netlink xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill iptable_mangle iptable_filter iptable_nat zram nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_intel mei_pxp irqbypass spi_nor mtd crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic at24 mei_hdcp [206201.415853] iTCO_wdt spi_intel_platform spi_intel intel_pmc_bxt gf128mul snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi snd_hda_codec ghash_clmulni_intel r8169 sha512_ssse3 sha256_ssse3 snd_hda_core sha1_ssse3 aesni_intel crypto_simd cryptd mxm_wmi snd_hwdep rapl intel_cstate realtek mdio_devres mei_me alx snd_pcm intel_uncore lpc_ich i2c_i801 libphy mei i2c_smbus snd_timer snd soundcore mdio mac_hid tcp_bbr netatop(OE) sg crypto_user loop fuse dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy ttm intel_gtt xhci_pci crc32c_intel drm_display_helper xhci_pci_renesas cec video wmi [206201.415909] ---[ end trace 0000000000000000 ]--- [206201.415912] RIP: 0010:kmem_cache_alloc+0x115/0x370 [206201.415917] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00 [206201.415922] RSP: 0018:ffffb5340c80b700 EFLAGS: 00010092 [206201.415926] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04b3e8139ea [206201.415929] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f [206201.415931] RBP: ffffb5340c80b750 R08: be27f590f0ab061f R09: 00000000004bc400 [206201.415934] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00 [206201.415937] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078 [206201.415940] FS: 000000c000201490(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000 [206201.415944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [206201.415947] CR2: 00007c086e930180 CR3: 0000000105504006 CR4: 00000000001706f0 [206201.415950] note: snowflake[310603] exited with irqs disabled [206201.416000] note: snowflake[310603] exited with preempt_count 2 ```

And here's another, older one

Another crash (spoiler) ``` [45470.068801] general protection fault, probably for non-canonical address 0xb1700553f83edc8f: 0000 [#1] PREEMPT SMP PTI [45470.068811] CPU: 0 PID: 84945 Comm: eiskaltdcpp-qt Tainted: G OE 6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b [45470.068816] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018 [45470.068820] RIP: 0010:kmem_cache_alloc+0x115/0x370 [45470.068826] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00 [45470.068831] RSP: 0018:ffffadbe0cf3fa90 EFLAGS: 00010082 [45470.068834] RAX: b1700553f83edc8f RBX: 06b83e4ab7a12ba8 RCX: ffff9f4f3a31c43a [45470.068847] RDX: 00000000002c3e00 RSI: 0000000000000820 RDI: b1700553f83edc57 [45470.068850] RBP: ffffadbe0cf3fae0 R08: b1700553f83edc57 R09: 00000000002c3e00 [45470.068853] R10: 00002e6c70a0d860 R11: 0000000000000001 R12: ffff9f4e84b91c00 [45470.068855] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078 [45470.068858] FS: 0000764c2cbea700(0000) GS:ffff9f518f200000(0000) knlGS:0000000000000000 [45470.068861] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [45470.068864] CR2: 0000764c4d193c88 CR3: 00000002af58a002 CR4: 00000000001706f0 [45470.068867] Call Trace: [45470.068870] [45470.068873] ? die_addr+0x36/0x90 [45470.068878] ? exc_general_protection+0x1c5/0x430 [45470.068884] ? asm_exc_general_protection+0x26/0x30 [45470.068889] ? kmem_cache_alloc+0x115/0x370 [45470.068894] ? get_taskinfo+0xa5/0x1b0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45470.068901] get_taskinfo+0xa5/0x1b0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45470.068908] sock2task+0x1fe/0x380 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45470.069059] Modules linked in: xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock netconsole dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill zram iptable_mangle iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm irqbypass ledtrig_audio crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul spi_nor mtd iTCO_wdt spi_intel_platform at24 intel_pmc_bxt iTCO_vendor_support snd_hda_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 [45470.068924] analyze_tcpv4_packet+0x1be/0x210 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45470.068931] ipv4_hookout+0xa5/0xe0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45470.068937] nf_hook_slow+0x45/0xc0 [45470.068943] __ip_local_out+0xfa/0x180 [45470.069111] aesni_intel mei_pxp mei_hdcp spi_intel crypto_simd cryptd rapl intel_cstate snd_intel_dspcfg i2c_i801 snd_intel_sdw_acpi snd_hda_codec intel_uncore snd_hda_core r8169 i2c_smbus realtek mxm_wmi snd_hwdep snd_pcm lpc_ich snd_timer snd mei_me mdio_devres alx soundcore mei libphy mdio mac_hid tcp_bbr netatop(OE) sg crypto_user dm_mod loop fuse nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy crc32c_intel ttm intel_gtt xhci_pci drm_display_helper xhci_pci_renesas cec video wmi [45470.069159] ---[ end trace 0000000000000000 ]--- [45470.068947] ? __pfx_dst_output+0x10/0x10 [45470.069162] RIP: 0010:kmem_cache_alloc+0x115/0x370 [45470.069166] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00 [45470.069170] RSP: 0018:ffffadbe0cf3fa90 EFLAGS: 00010082 [45470.068950] ip_local_out+0x1b/0x70 [45470.069173] RAX: b1700553f83edc8f RBX: 06b83e4ab7a12ba8 RCX: ffff9f4f3a31c43a [45470.069176] RDX: 00000000002c3e00 RSI: 0000000000000820 RDI: b1700553f83edc57 [45470.069178] RBP: ffffadbe0cf3fae0 R08: b1700553f83edc57 R09: 00000000002c3e00 [45470.068953] __ip_queue_xmit+0x175/0x490 [45470.069181] R10: 00002e6c70a0d860 R11: 0000000000000001 R12: ffff9f4e84b91c00 [45470.069183] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078 [45470.068957] __tcp_transmit_skb+0xa5e/0xbf0 [45470.069186] FS: 0000764c2cbea700(0000) GS:ffff9f518f200000(0000) knlGS:0000000000000000 [45470.069189] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [45470.068963] tcp_write_xmit+0x544/0x14e0 [45470.069191] CR2: 0000764c4d193c88 CR3: 00000002af58a002 CR4: 00000000001706f0 [45470.069194] note: eiskaltdcpp-qt[84945] exited with irqs disabled [45470.068967] __tcp_push_pending_frames+0x36/0xf0 [45470.068971] inet_shutdown+0xe2/0xf0 [45470.068974] __sys_shutdown+0x60/0xb0 [45470.068979] __x64_sys_shutdown+0x14/0x20 [45470.068982] do_syscall_64+0x60/0x90 [45470.068985] ? __do_softirq+0x186/0x2c8 [45470.068989] ? __irq_exit_rcu+0x4b/0xc0 [45470.068993] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [45470.068997] RIP: 0033:0x764c719290c7 [45470.069214] note: eiskaltdcpp-qt[84945] exited with preempt_count 2 [45470.069026] Code: f0 ff ff 73 01 c3 48 8b 0d c6 0d 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 30 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 0d 0d 00 f7 d8 64 89 01 48 [45470.069030] RSP: 002b:0000764c2cbe9cb8 EFLAGS: 00000217 ORIG_RAX: 0000000000000030 [45470.069044] RAX: ffffffffffffffda RBX: 0000764c48495b30 RCX: 0000764c719290c7 [45470.069046] RDX: 0000764c71762590 RSI: 0000000000000002 RDI: 0000000000000035 [45470.069049] RBP: 0000764c482bb0b0 R08: 0000764c719fb9c0 R09: 0000000000000000 [45470.069051] R10: 0000000000000001 R11: 0000000000000217 R12: 0000764c2cbe9d70 [45470.069054] R13: 0000764c717b4a8d R14: 0000764c2cbe9d70 R15: 0000764c2cbe9d80 [45470.069058] [45511.480830] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kcompactd0:48] [45511.480898] Modules linked in: xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock netconsole dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill zram iptable_mangle iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm irqbypass ledtrig_audio crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul spi_nor mtd iTCO_wdt spi_intel_platform at24 intel_pmc_bxt iTCO_vendor_support snd_hda_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 [45511.480940] aesni_intel mei_pxp mei_hdcp spi_intel crypto_simd cryptd rapl intel_cstate snd_intel_dspcfg i2c_i801 snd_intel_sdw_acpi snd_hda_codec intel_uncore snd_hda_core r8169 i2c_smbus realtek mxm_wmi snd_hwdep snd_pcm lpc_ich snd_timer snd mei_me mdio_devres alx soundcore mei libphy mdio mac_hid tcp_bbr netatop(OE) sg crypto_user dm_mod loop fuse nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy crc32c_intel ttm intel_gtt xhci_pci drm_display_helper xhci_pci_renesas cec video wmi [45511.480992] CPU: 2 PID: 48 Comm: kcompactd0 Tainted: G D OE 6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b [45511.480998] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018 [45511.481001] RIP: 0010:smp_call_function_many_cond+0x12b/0x500 [45511.481009] Code: df e8 49 fc 48 00 3b 05 83 aa e5 01 73 26 48 63 d0 49 8b 34 24 48 03 34 d5 e0 4c 70 b8 8b 56 08 83 e2 01 74 0a f3 90 8b 4e 08 <83> e1 01 75 f6 83 c0 01 eb c1 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e [45511.481015] RSP: 0018:ffffadbe001c79c8 EFLAGS: 00000202 [45511.481018] RAX: 0000000000000000 RBX: ffff9f518f335508 RCX: 0000000000000011 [45511.481022] RDX: 0000000000000001 RSI: ffff9f518f23b300 RDI: ffff9f518f335508 [45511.481025] RBP: ffff9f518f335500 R08: ffff9f518f335508 R09: 0000000000000000 [45511.481028] R10: ffff9f518f335530 R11: 0000000000000000 R12: ffff9f518f335500 [45511.481031] R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000002 [45511.481034] FS: 0000000000000000(0000) GS:ffff9f518f300000(0000) knlGS:0000000000000000 [45511.481038] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [45511.481041] CR2: 000072350ee3a8e0 CR3: 000000029b58e003 CR4: 00000000001706e0 [45511.481046] Call Trace: [45511.481049] [45511.481054] ? watchdog_timer_fn+0x1b8/0x220 [45511.481058] ? __pfx_watchdog_timer_fn+0x10/0x10 [45511.481171] kthread+0xe8/0x120 [45511.481175] ? __pfx_kthread+0x10/0x10 [45511.481062] ? __hrtimer_run_queues+0x112/0x2b0 [45511.481180] ret_from_fork+0x34/0x50 [45511.481067] ? hrtimer_interrupt+0xf8/0x230 [45511.481184] ? __pfx_kthread+0x10/0x10 [45511.481188] ret_from_fork_asm+0x1b/0x30 [45511.481194] [45511.481071] ? __sysvec_apic_timer_interrupt+0x50/0x140 [45511.481077] ? sysvec_apic_timer_interrupt+0x6d/0x90 [45511.481081] [45511.481083] [45511.481086] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [45511.481092] ? smp_call_function_many_cond+0x12b/0x500 [45511.481095] ? smp_call_function_many_cond+0x107/0x500 [45511.481099] ? __pfx_invalidate_bh_lru+0x10/0x10 [45511.481104] on_each_cpu_cond_mask+0x24/0x40 [45511.481109] __buffer_migrate_folio+0xf8/0x2a0 [45511.481115] move_to_new_folio+0x53/0x140 [45511.481119] migrate_pages_batch+0x8e5/0xca0 [45511.481123] ? __pfx_compaction_free+0x10/0x10 [45511.481127] ? __pfx_remove_migration_pte+0x10/0x10 [45511.481131] ? __pfx_compaction_alloc+0x10/0x10 [45511.481135] migrate_pages+0xb41/0xe00 [45511.481138] ? __pfx_compaction_free+0x10/0x10 [45511.481142] ? __pfx_compaction_alloc+0x10/0x10 [45511.481145] ? __pfx_compaction_alloc+0x10/0x10 [45511.481149] compact_zone+0x831/0xf20 [45511.481154] proactive_compact_node+0x85/0xe0 [45511.481158] kcompactd+0x35b/0x430 [45511.481162] ? __pfx_autoremove_wake_function+0x10/0x10 [45511.481167] ? __pfx_kcompactd+0x10/0x10 [45532.577883] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [45532.577949] rcu: 1-...!: (0 ticks this GP) idle=c1b4/1/0x4000000000000000 softirq=9952024/9952024 fqs=237 [45532.577957] rcu: (detected by 2, t=18006 jiffies, g=15878461, q=9524 ncpus=4) [45532.577963] Sending NMI from CPU 2 to CPUs 1: [45532.577971] NMI backtrace for cpu 1 [45532.577973] CPU: 1 PID: 226 Comm: knetatop Tainted: G D OEL 6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b [45532.577975] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018 [45532.577976] RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2e0 [45532.577979] Code: 77 7f f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 5b 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 65 48 ff 05 a3 00 26 [45532.577980] RSP: 0018:ffffadbe0031fe88 EFLAGS: 00000002 [45532.577982] RAX: 0000000000000001 RBX: ffffffffc0acdbd8 RCX: 0000000000000000 [45532.577983] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffc0acdbd8 [45532.577983] RBP: ffffffffc0acdbc8 R08: 0000000000000082 R09: 0000000080240023 [45532.577984] R10: 0000000000000000 R11: 0000000000000046 R12: ffffffffc0acdb50 [45532.577985] R13: 0000000000000082 R14: 0000000000000287 R15: ffffffffc0acdbd8 [45532.577986] FS: 0000000000000000(0000) GS:ffff9f518f280000(0000) knlGS:0000000000000000 [45532.577987] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [45532.577988] CR2: 0000771e325ae1a0 CR3: 000000000c6de004 CR4: 00000000001706e0 [45532.577988] Call Trace: [45532.577990] [45532.577991] ? nmi_cpu_backtrace+0x99/0x110 [45532.577994] ? nmi_cpu_backtrace_handler+0x11/0x20 [45532.577997] ? nmi_handle+0x61/0x150 [45532.577999] ? default_do_nmi+0x40/0x100 [45532.578001] ? exc_nmi+0x125/0x1a0 [45532.578002] ? end_repeat_nmi+0x16/0x67 [45532.578006] ? native_queued_spin_lock_slowpath+0x6e/0x2e0 [45532.578008] ? native_queued_spin_lock_slowpath+0x6e/0x2e0 [45532.578010] ? native_queued_spin_lock_slowpath+0x6e/0x2e0 [45532.578011] [45532.578011] [45532.578012] _raw_spin_lock_irqsave+0x3d/0x50 [45532.578014] garbage_collector+0x66/0x3c0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45532.578019] ? __pfx_netatop_thread+0x10/0x10 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45532.578023] netatop_thread+0x10/0x30 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45532.578027] kthread+0xe8/0x120 [45532.578029] ? __pfx_kthread+0x10/0x10 [45532.578031] ret_from_fork+0x34/0x50 [45532.578033] ? __pfx_kthread+0x10/0x10 [45532.578035] ret_from_fork_asm+0x1b/0x30 [45532.578038] [45532.578969] rcu: rcu_preempt kthread timer wakeup didn't happen for 17293 jiffies! g15878461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [45532.579122] rcu: Possible timer handling issue on cpu=0 timer-softirq=5779617 [45532.579127] rcu: rcu_preempt kthread starved for 17295 jiffies! g15878461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 [45532.579134] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [45532.579139] rcu: RCU grace-period kthread stack dump: [45532.579144] task:rcu_preempt state:I stack:0 pid:18 ppid:2 flags:0x00004000 [45532.579151] Call Trace: [45532.579155] [45532.579160] ? __pfx_rcu_gp_kthread+0x10/0x10 [45532.579167] __schedule+0x3e7/0x1410 [45532.579174] ? __pfx_rcu_gp_kthread+0x10/0x10 [45532.579180] schedule+0x5e/0xd0 [45532.579186] schedule_timeout+0x98/0x160 [45532.579192] ? __pfx_process_timeout+0x10/0x10 [45532.579198] rcu_gp_fqs_loop+0x107/0x560 [45532.579204] rcu_gp_kthread+0xd4/0x190 [45532.579210] kthread+0xe8/0x120 [45532.579216] ? __pfx_kthread+0x10/0x10 [45532.579222] ret_from_fork+0x34/0x50 [45532.579227] ? __pfx_kthread+0x10/0x10 [45532.579233] ret_from_fork_asm+0x1b/0x30 [45532.579240] [45532.579245] rcu: Stack dump where RCU GP kthread last ran: [45532.579249] Sending NMI from CPU 2 to CPUs 0: [45532.579255] NMI backtrace for cpu 0 [45532.579257] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D OEL 6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b [45532.579259] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018 [45532.579260] RIP: 0010:native_queued_spin_lock_slowpath+0x225/0x2e0 [45532.579263] Code: 41 c1 e4 10 41 c1 e5 12 45 09 ec 44 89 e0 c1 e8 10 66 87 43 02 89 c2 c1 e2 10 81 fa ff ff 00 00 77 5e 31 d2 eb 02 f3 90 8b 03 <66> 85 c0 75 f7 44 39 e0 0f 84 8e 00 00 00 c6 03 01 48 85 d2 74 0e [45532.579264] RSP: 0018:ffffadbe00003d40 EFLAGS: 00000002 [45532.579265] RAX: 0000000000100101 RBX: ffffffffc0acdbd8 RCX: 00000000533970ee [45532.579266] RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffffffffc0acdbd8 [45532.579267] RBP: ffff9f518f235040 R08: ffff9f4e9b57304e R09: ffff9f4e9b573062 [45532.579268] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000040000 [45532.579269] R13: 0000000000040000 R14: 0000000000000000 R15: 0000000000000069 [45532.579269] FS: 0000000000000000(0000) GS:ffff9f518f200000(0000) knlGS:0000000000000000 [45532.579271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [45532.579271] CR2: 00005e232cfbb0bc CR3: 00000002af57a004 CR4: 00000000001706f0 [45532.579272] Call Trace: [45532.579273] [45532.579274] ? nmi_cpu_backtrace+0x99/0x110 [45532.579277] ? nmi_cpu_backtrace_handler+0x11/0x20 [45532.579279] ? nmi_handle+0x61/0x150 [45532.579282] ? default_do_nmi+0x40/0x100 [45532.579283] ? exc_nmi+0x125/0x1a0 [45532.579284] ? end_repeat_nmi+0x16/0x67 [45532.579287] ? native_queued_spin_lock_slowpath+0x225/0x2e0 [45532.579289] ? native_queued_spin_lock_slowpath+0x225/0x2e0 [45532.579291] ? native_queued_spin_lock_slowpath+0x225/0x2e0 [45532.579292] [45532.579293] [45532.579293] _raw_spin_lock_irqsave+0x3d/0x50 [45532.579295] analyze_tcpv4_packet+0x89/0x210 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45532.579300] ipv4_hookin+0x96/0xd0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966] [45532.579304] nf_hook_slow+0x45/0xc0 [45532.579308] ip_local_deliver+0xd0/0x120 [45532.579310] ? __pfx_ip_local_deliver_finish+0x10/0x10 [45532.579312] __netif_receive_skb_one_core+0x89/0xa0 [45532.579316] process_backlog+0x85/0x120 [45532.579318] __napi_poll+0x2b/0x1b0 [45532.579320] net_rx_action+0x2b5/0x370 [45532.579323] __do_softirq+0xd4/0x2c8 [45532.579325] __irq_exit_rcu+0xa3/0xc0 [45532.579328] common_interrupt+0x86/0xa0 [45532.579330] [45532.579330] [45532.579331] asm_common_interrupt+0x26/0x40 [45532.579332] RIP: 0010:cpuidle_enter_state+0xcc/0x440 [45532.579335] Code: da 75 38 ff e8 d5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 b3 76 37 ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [45532.579336] RSP: 0018:ffffffffb8e03e28 EFLAGS: 00000246 [45532.579336] RAX: ffff9f518f2341c0 RBX: ffff9f518f23dcc0 RCX: 000000000000001f [45532.579337] RDX: 0000000000000000 RSI: 000000002802f942 RDI: 0000000000000000 [45532.579338] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000000067 [45532.579338] R10: 0000000000000014 R11: ffff9f518f232ba4 R12: ffffffffb8f47ea0 [45532.579339] R13: 0000295bf4e52e5b R14: 0000000000000001 R15: 0000000000000000 [45532.579341] ? cpuidle_enter_state+0xbd/0x440 [45532.579343] cpuidle_enter+0x2d/0x40 [45532.579346] do_idle+0x1d8/0x230 [45532.579349] cpu_startup_entry+0x2a/0x30 [45532.579351] rest_init+0xca/0xd0 [45532.579352] arch_call_rest_init+0xe/0x30 [45532.579355] start_kernel+0x704/0xa90 [45532.579357] x86_64_start_reservations+0x18/0x30 [45532.579360] x86_64_start_kernel+0x96/0xa0 [45532.579362] secondary_startup_64_no_verify+0x18f/0x19b [45532.579366] ```

Tested on different 6.6 kernels. The crashes start happen in a day I've installed netatop (5 April), and for some reason I've missed "netatop" string in the stack trace, noticing it only today, after today's crash. The server gets rebooted (and I also have hardware watchdog), so it's either full crash or CPU deadlock.

I'm running ArchLinux, used netatop-dkms (3.1-2) module (netatop-3.1.tar.gz)

The contact form on the website does not work. It returns HTTP 500 upon submitting.

glangeveld commented 3 months ago

I fixed the contact form on the website. Thanks for warning.

glangeveld commented 3 months ago

I also created a test version of the netatop module which you can download (Update: removed now). Could you please test this version?

ValdikSS commented 3 months ago

@glangeveld, from what I could see by diffing 3.2.1 and 3.1, only one lock is added which doesn't seem relevant. I'll try to test it but I can only test it on my main machine which I'm not a fan when it hangs.

UPD: Thu May 16 19:54:00 2024 compiled and loaded the module.

ValdikSS commented 2 months ago

Its May 24, so far so good, @glangeveld.

Atoptool commented 2 months ago

Thanks for testing! The modification does certainly not only concern the addition of a lock (but even that could have solved an issue). There was a race condition in the garbage collection that has been solved by adding a reference count. I will release a new version as soon as possible.

Atoptool commented 2 months ago

Version 3.2.2 can be downloaded from here.