NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
326 stars 66 forks source link

Kernel oops with v3.6.0-rc4 on Ubuntu Xenial LTS kernel #284

Closed toreanderson closed 5 years ago

toreanderson commented 5 years ago

After I hit #283 when attempting a kernel upgrade, I thought I'd simply try upgrading to the latest v3.6 RC. It built fine, but when starting it up, the kernel oopsed:

[81201.607134] jool_siit: loading out-of-tree module taints kernel.
[81201.607246] jool_siit: module verification failed: signature and/or required key missing - tainting kernel
[81201.617897] jool_siit: unknown parameter 'disabled' ignored
[81201.617989] SIIT Jool: SIIT Jool v3.5.7.203 module inserted.
[81201.621067] BUG: unable to handle kernel NULL pointer dereference at           (null)
[81201.626745] IP: [<ffffffffc04e9adb>] __handle_jool_message+0x6b/0x230 [jool_siit]
[81201.642269] PGD 800000043f549067 PUD 4437e6067 PMD 0 
[81201.645248] Oops: 0000 [#1] SMP 
[81201.661372] Modules linked in: jool_siit(OE) ipmi_devintf 8021q garp mrp stp llc mptctl bonding ipmi_ssif ast ttm drm_kms_helper drm gpio_ich intel_powerclamp fb_sys_fops syscopyarea sysfillrect coretemp joydev sysimgblt input_leds lpc_ich 8250_fintek kvm_intel i5500_temp kvm ioatdma i7core_edac mac_hid edac_core irqbypass shpchp ipmi_si ipmi_msghandler xt_mark ip6table_mangle lp nf_conntrack_ipv6 parport nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables btrfs xor raid6_pq hid_generic igb mptsas i2c_algo_bit mptscsih dca usbhid mptbase ahci ptp libahci hid scsi_transport_sas pps_core fjes
[81201.764705] CPU: 4 PID: 21256 Comm: jool_siit Tainted: G          IOE   4.4.0-144-generic #170~14.04.1-Ubuntu
[81201.783083] Hardware name: SUN MICROSYSTEMS SUN FIRE X4170 SERVER          /ASSY,MOTHERBOARD,X4170, BIOS 07060309 07/10/2013
[81201.802550] task: ffff88043f681980 ti: ffff88043ed1c000 task.ti: ffff88043ed1c000
[81201.821126] RIP: 0010:[<ffffffffc04e9adb>]  [<ffffffffc04e9adb>] __handle_jool_message+0x6b/0x230 [jool_siit]
[81201.840562] RSP: 0018:ffff88043ed1f8c8  EFLAGS: 00010246
[81201.843184] RAX: ffff88006d773100 RBX: ffff88043ed1fb78 RCX: ffffffffc04f0c74
[81201.861503] RDX: ffffffffc04f0c82 RSI: ffffffff81e49360 RDI: 0000000000000000
[81201.864907] RBP: ffff88043ed1fb38 R08: ffff88043ed1f8cf R09: ffff88006d773100
[81201.882610] R10: ffff880442436810 R11: 0000000000000004 R12: ffffffffc04f7d20
[81201.900796] R13: ffff880442436814 R14: ffff8802749af100 R15: ffffffff81efd700
[81201.903353] FS:  00007fa2c9018740(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
[81201.922927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[81201.940437] CR2: 0000000000000000 CR3: 00000004422aa000 CR4: 0000000000000670
[81201.942709] Stack:
[81201.944213]  0000000000000246 0000000202235250 0000000000000001 0000000000000001
[81201.963722]  ffff88047ffef6c0 0000000002235250 0000000000000141 ffff88043f681980
[81201.981645]  ffff88043ed1f998 ffffffff811979f0 0000000000000000 ffff88043f681980
[81201.985080] Call Trace:
[81202.000739]  [<ffffffff811979f0>] ? __alloc_pages_nodemask+0x130/0x250
[81202.003555]  [<ffffffff811ea16c>] ? ___slab_alloc+0x1cc/0x470
[81202.021399]  [<ffffffff817105be>] ? __alloc_skb+0x4e/0x260
[81202.023602]  [<ffffffff811ee18b>] ? __kmalloc_node_track_caller+0x24b/0x2b0
[81202.041976]  [<ffffffff81711b4a>] ? pskb_expand_head+0x6a/0x260
[81202.044198]  [<ffffffff8170eec1>] ? __kmalloc_reserve.isra.34+0x31/0x90
[81202.061763]  [<ffffffff8170ed13>] ? skb_queue_tail+0x43/0x50
[81202.063975]  [<ffffffff81754ba2>] ? __netlink_sendskb+0x42/0x60
[81202.082657]  [<ffffffff81757369>] ? netlink_unicast+0x1c9/0x230
[81202.100446]  [<ffffffffc04e9cc6>] handle_jool_message+0x26/0x40 [jool_siit]
[81202.102717]  [<ffffffff817587e1>] genl_family_rcv_msg+0x1d1/0x390
[81202.121370]  [<ffffffff817589a0>] ? genl_family_rcv_msg+0x390/0x390
[81202.123886]  [<ffffffff81758a20>] genl_rcv_msg+0x80/0xc0
[81202.141130]  [<ffffffff81757939>] netlink_rcv_skb+0xa9/0xc0
[81202.143905]  [<ffffffff81757ff8>] genl_rcv+0x28/0x40
[81202.161358]  [<ffffffff81757303>] netlink_unicast+0x163/0x230
[81202.164143]  [<ffffffff817576eb>] netlink_sendmsg+0x31b/0x390
[81202.182271]  [<ffffffff81707c6e>] sock_sendmsg+0x3e/0x50
[81202.186127]  [<ffffffff817085c6>] ___sys_sendmsg+0x276/0x290
[81202.202494]  [<ffffffff8119d4d7>] ? lru_cache_add_active_or_unevictable+0x27/0x90
[81202.206691]  [<ffffffff813949db>] ? aa_sock_perm+0x4b/0xe0
[81202.223923]  [<ffffffff8170719d>] ? SYSC_getsockname+0xcd/0xe0
[81202.227187]  [<ffffffff81708e92>] __sys_sendmsg+0x42/0x80
[81202.243346]  [<ffffffff81708ee2>] SyS_sendmsg+0x12/0x20
[81202.246560]  [<ffffffff8182d39b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[81202.265280] Code: 00 f6 05 5a f1 00 00 04 0f 85 f9 00 00 00 48 8b 43 20 4c 8d 85 97 fd ff ff 48 c7 c1 74 0c 4f c0 48 c7 c2 82 0c 4f c0 48 8b 78 10 <0f> b7 37 48 83 c7 04 83 ee 04 48 63 f6 e8 f3 49 ff ff 85 c0 74 
[81202.304233] RIP  [<ffffffffc04e9adb>] __handle_jool_message+0x6b/0x230 [jool_siit]
[81202.308917]  RSP <ffff88043ed1f8c8>
[81202.322048] CR2: 0000000000000000
[81202.328054] ---[ end trace 0ab568a8a3bd889c ]---
[81202.331184] init: jool pre-start process (21254) terminated with status 137

I did not investigate exactly at which porint in the Jool initialisation routine the oops occurred as I did not have time to do so during the maintenance window. (I reverted to v3.5.7 with an older LTS kernel instead.) This is what the init script does, in a nutshell:

modprobe jool_siit disabled=1
jool_siit --pool6 --add <pool6>
jool_siit --eamt -add <eam4-1> <eam6-1>
jool_siit --eamt -add <eam4-2> <eam6-2>
[...]
jool_siit --eamt -add <eam4-n> <eam6-n>
jool_siit --pool6791 --add <pool6791>
jool_siit --enable 
ydahhrk commented 5 years ago

BTW: 3.6.0-rc4 is not really the last release candidate of anything; 3.6 was renamed into 4.0. That's why the sequence is 3.6.0-rc1, 3.6.0-rc2, 3.6.0-rc3, 3.6.0-rc4, 4.0.0-rc5 and 4.0.0.

All of these expect the new command line syntax:

modprobe jool_siit
jool_siit instance add --netfilter -6 <pool6>
jool_siit eamt add <eam4-1> <eam6-1>
jool_siit pool6791 add <pool6791>
ydahhrk commented 5 years ago

Whatever the error is, it might be also present in 4.0.0. __handle_jool_message() didn't change.

You sure the new userspace client was installed correctly? I can't reproduce it because none of the jool_siit commands are well-formed (as far as rc4 is concerned), so the requests are shot down long before they reach the kernel.

Check jool_siit --version, please.

toreanderson commented 5 years ago

3.6.0-rc4 is not really the last release candidate of anything; 3.6 was renamed into 4.0.

:man_facepalming:

You sure the new userspace client was installed correctly?

No, I'm not sure. I just used my regular install script, tried to fire it up, got that error, and reverted as I didn't have time to debug further at that point. Might be the new client didn't get installed over the old version and I just didn't notice.

I'm closing this issue as it's probably just a dumb user error. Apologies for the noise. I'll let you know If I experience it after I upgrade to 4.0 (including updating all the CLI calls).

ydahhrk commented 5 years ago

Ok, but

It won't do if all it takes to crash the kernel is to issue a command from an outdated client.

I need to look into this more.

ydahhrk commented 5 years ago

Bug confirmed. Fixing.