NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
320 stars 66 forks source link

Jool pool4 flush crashes with debug kernel to use-after-free #368

Closed terofinn closed 2 years ago

terofinn commented 2 years ago

The jool's pool4 db flush seem to always crash, looks like use-after-free based on memory poison values in registers. 0x6b = POISON_FREE Added some printk debugs to src/mod/common/db/pool4/db.c

Jool version is 4.1.5 and kernel version is 4.19.181.

Following memory debugging options are enabled in the kernel:

CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PAGE_POISONING=y

Backtrace from crash

...
[  383.079504] Clear TCP mark ffff888037bf35c8
[  383.081727] Clear UDP mark ffff888037bf35d0
[  383.084181] TABLE ffff888024784138
[  383.086095] Clear ICMP mark ffff888037bf35d8
[  383.088352] TABLE ffff888077a78008
[  383.090185] Clear TCP addr ffff888037bf35e0
[  383.092358] Clear TCP addr ffff888037bf35e8
[  383.094532] TABLE ffff88802c0c5778
[  383.096714] TABLE ffff888029996e18
[  383.098555] TABLE ffff888074902198
[  383.100485] stack segment: 0000 [#1] SMP PTI
[  383.102707] CPU: 0 PID: 4986 Comm: jool Kdump: loaded Tainted: G           O      4.19.181+smp-debug #1
[  383.106977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  383.110607] RIP: 0010:rbtree_foreach+0x63/0x90 [jool_common]
[  383.112689] Code: 89 df 41 ff d5 48 85 ed 75 20 eb 3d 48 39 d8 75 2a 48 89 ef 4c 89 e6 48 89 eb 41 ff d5 48 8b 45 00 48 83 e0 fc 48 89 c5 74 1f <48> 39 5d 10 48 8b 45 08 75 d8 48 85 c0 74 d8 eb ac 48 c7 c7 d8 6d
[  383.119453] RSP: 0018:ffffc90001f239d8 EFLAGS: 00010202
[  383.121391] RAX: 6b6b6b6b6b6b6b68 RBX: ffff8880749021a8 RCX: 0000000000140012
[  383.123993] RDX: 0000000000140013 RSI: 0000000000000000 RDI: 0000000000000246
[  383.126629] RBP: 6b6b6b6b6b6b6b68 R08: 0000000000000000 R09: 0000000000000000
[  383.129290] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  383.131884] R13: ffffffffa125fa28 R14: ffff8880248ad098 R15: ffff88802a1761d8
[  383.134612] FS:  00007ffff783c080(0000) GS:ffff88807d800000(0000) knlGS:0000000000000000
[  383.137582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  383.139652] CR2: 000055555557d048 CR3: 00000000207b2000 CR4: 00000000000006f0
[  383.142254] Call Trace:
[  383.143209]  rbtree_clear+0xe/0x20 [jool_common]
[  383.144937]  clear_trees+0xbe/0xe0 [jool_common]
[  383.146687]  pool4db_flush+0x1e/0x30 [jool_common]
[  383.148463]  handle_pool4_flush+0x3e/0xd0 [jool_common]
[  383.150424]  ? handling_hairpinning_siit+0xf0/0xf0 [jool_common]
[  383.152637]  ? is_hairpin_nat64+0x40/0x40 [jool_common]
[  383.154620]  genl_family_rcv_msg+0x18a/0x390
[  383.156772]  genl_rcv_msg+0x47/0x90
[  383.158226]  ? genl_family_rcv_msg+0x390/0x390
[  383.159875]  netlink_rcv_skb+0x37/0xf0
[  383.161270]  genl_rcv+0x24/0x40
[  383.162453]  netlink_unicast+0x16c/0x210
[  383.163926]  netlink_sendmsg+0x1ca/0x3e0
[  383.165471]  sock_sendmsg+0x13/0x20
[  383.167492]  ___sys_sendmsg+0x23b/0x280
[  383.169647]  ? ___sys_recvmsg+0x134/0x190
[  383.171984]  ? __handle_mm_fault+0x9f7/0xf20
[  383.174510]  ? _raw_spin_unlock+0x24/0x30
[  383.176904]  ? __handle_mm_fault+0x65d/0xf20
[  383.179558]  __sys_sendmsg+0x47/0x80
[  383.181873]  do_syscall_64+0x50/0x1a0
[  383.184176]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  383.187289] RIP: 0033:0x7ffff7b93431
[  383.189255] Code: ad 9b 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 80 00 00 00 00 8b 05 1a e0 00 00 85 c0 75 16 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 41 89 d4 55 48
[  383.197250] RSP: 002b:00007fffffffe868 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  383.200006] RAX: ffffffffffffffda RBX: 000055555557ae10 RCX: 00007ffff7b93431
[  383.202873] RDX: 0000000000000000 RSI: 00007fffffffe8a0 RDI: 0000000000000003
[  383.205560] RBP: 000055555557af30 R08: 00007fffffffe970 R09: ffffffff00000000
[  383.208740] R10: 000055555557a010 R11: 0000000000000246 R12: 000055555557ad20
[  383.211346] R13: 00007fffffffe8a0 R14: 00007fffffffeb08 R15: 0000555555577620
ydahhrk commented 2 years ago

The jool's pool4 db flush seem to always crash

When you say "always," you mean even when there's nothing in the table?

And if not, do you have a sample population add/remove/flush sequence?

ydahhrk commented 2 years ago

Ok, I think I found the bug: Line 60 or 62 deletes the parent, then lines 68-69 attempt to dereference it. Duh.

I suppose I could fix it, but support for kernels 3.11- was abandoned a long time ago, so the right solution is to drop rbtree_foreach() in favor of rbtree_postorder_for_each_entry_safe().

ydahhrk commented 2 years ago

How about now?

terofinn commented 2 years ago

Thanks, works fine now!

terofinn commented 2 years ago

Hmm, did already close this issue but maybe it should remain open util the fix is in master?