NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
326 stars 66 forks source link

Kernel crash when running Jool 4.0.4 #296

Closed boleifu closed 5 years ago

boleifu commented 5 years ago

Hi, we observed some kernel crash when running Jool in Stateful NAT64 mode. The issue occurred when sending V6-ICMP packets to a V4 endpoint (with "64:ff9b::" prefix). The kernel version is "4.4.0-157-generic".

Here is the call trace:

[25847.167857] Call Trace: [25847.170615] [25847.172773] [] ? ttwu_do_activate.constprop.90+0x5d/0x70 [25847.180815] [] ? try_to_wake_up+0x49/0x3f0 [25847.187275] [] ? nf_ct_invert_tuple+0x77/0x90 [nf_conntrack] [25847.195499] [] nf_nat_setup_info+0x8c/0x310 [nf_nat] [25847.202924] [] nf_nat_alloc_null_binding+0x57/0x80 [nf_nat] [25847.211243] [] nf_nat_alloc_null_binding+0x21/0x30 [nf_nat] [25847.219369] [] nf_nat_ipv4_fn+0x1cc/0x220 [nf_nat_ipv4] [25847.227084] [] ? iptable_nat_ipv4_fn+0x20/0x20 [iptable_nat] [25847.235307] [] nf_nat_ipv4_out+0x51/0xf0 [nf_nat_ipv4] [25847.242924] [] iptable_nat_ipv4_out+0x15/0x20 [iptable_nat] [25847.251050] [] nf_iterate+0x63/0x80 [25847.256822] [] nf_hook_slow+0x66/0xc0 [25847.262788] [] ip_output+0xb1/0xd0 [25847.268462] [] ? ip_fragment.constprop.53+0x80/0x80 [25847.275789] [] sendpkt_send+0x179/0x270 [jool] [25847.282633] [] core_common+0x66/0xd0 [jool] [25847.289186] [] core_6to4+0x107/0x160 [jool] [25847.295738] [] ? queue_pages_pte_range+0x280/0x2e0 [25847.302973] [] target_ipv6+0x44/0x80 [jool] [25847.309513] [] ip6t_do_table+0x2cd/0x690 [ip6_tables] [25847.317038] [] ? nf_ct_ext_add_length+0x11a/0x1b0 [nf_conntrack] [25847.325839] [] ? init_conntrack+0x334/0x5a0 [nf_conntrack] [25847.333846] [] ip6table_mangle_hook+0x4d/0x153 [ip6table_mangle] [25847.342455] [] nf_iterate+0x63/0x80 [25847.348227] [] nf_hook_slow+0x66/0xc0 [25847.354184] [] ipv6_rcv+0x3f9/0x4c0 [25847.359956] [] ? ip6_make_skb+0x210/0x210 [25847.366302] [] netif_receive_skb_core+0x38a/0xa00 [25847.373631] [] ? ttwu_do_wakeup+0x19/0xf0 [25847.379984] [] netif_receive_skb+0x18/0x60 [25847.386621] [] process_backlog+0x9e/0x140 [25847.392974] [] net_rx_action+0x161/0x350 [25847.399232] [] do_softirq+0xe5/0x2b0 [25847.405295] [] do_softirq_own_stack+0x1c/0x30 [25847.412036] [25847.414189] [] do_softirq+0x49/0x50 [25847.420190] [] local_bh_enable_ip+0x77/0xa0 [25847.426935] [] ovs_packet_cmd_execute+0x2a0/0x2c0 [openvswitch] [25847.435451] [] genl_family_rcv_msg+0x1d1/0x390 [25847.442291] [] ? genl_family_rcv_msg+0x390/0x390 [25847.449326] [] genl_rcv_msg+0x80/0xc0 [25847.455293] [] ? netlink_lookup+0xb1/0xf0 [25847.461842] [] netlink_rcv_skb+0xa9/0xc0 [25847.468099] [] genl_rcv+0x28/0x40 [25847.473675] [] netlink_unicast+0x163/0x240 [25847.480127] [] netlink_sendmsg+0x31b/0x390 [25847.486582] [] sock_sendmsg+0x3e/0x50 [25847.492548] [] ___sys_sendmsg+0x276/0x290 [25847.498903] [] ? sock_poll+0x52/0x120 [25847.504871] [] ? ep_send_events_proc+0x7e/0x180 [25847.511830] [] ? cputime_adjust+0x98/0x130 [25847.518282] [] ? ep_ptable_queue_proc+0xa0/0xa0 [25847.525220] [] ? audit_filter_rules.isra.9+0x242/0xe60 [25847.532838] [] sys_sendmsg+0x42/0x80 [25847.538899] [] SyS_sendmsg+0x12/0x20 [25847.544769] [] entry_SYSCALL_64_fastpath+0x22/0xcb [25847.551998] Code: 85 ef 01 00 00 41 0f b6 41 12 41 0f b6 51 26 45 85 ed 48 8b 3c c5 c0 e1 5e c0 48 8b 04 c5 40 e1 5e c0 48 8d 04 d0 48 89 7c 24 28 <4c> 8b 30 0f 85 80 05 00 00 41 f6 04 24 14 0f 84 ef 01 00 00 49 [25847.574096] RIP [] get_unique_tuple+0x82/0x630 [nf_nat] [25847.581824] RSP [25847.585747] CR2: 00000000000001d0 [25847.590001] ---[ end trace b95ee2088759a9c1 ]--- [25847.600434] Kernel panic - not syncing: Fatal exception in interrupt [25847.613024] Kernel Offset: 0x25000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

ydahhrk commented 5 years ago

It seems to be the same error as #289, which was patched in Jool 4.0.5. Please upgrade and see if it fixes the problem.

ydahhrk commented 5 years ago

BTW: You're running a NAT in the same namespace as the NAT64. This is not necessarily incorrect, but I've never heard of a solid use case for it.

boleifu commented 5 years ago

Thanks for the reply. We will try 4.0.5.

boleifu commented 5 years ago

4.0.5 fixed the crash. Thanks.