Chion82 / netfilter-full-cone-nat

A kernel module to turn MASQUERADE into full cone SNAT
GNU General Public License v2.0
427 stars 121 forks source link

Segfault when delete rule #5

Closed 4t0m1k closed 6 years ago

4t0m1k commented 6 years ago

Hello,

Anytime I delete a POSTROUTING -j FULLCONENAT rule, I got segfault from iptables. This does not happen with PREROUTING rules

If I retry to delete the rule, iptables becomes deadlocked.

System

kernel : 4.9.90 x86_64 dist : debian 9.3 iptables 1.6.2

Reproduce

insmod xt_FULLCONENAT.ko
iptables -t nat -A POSTROUTING -o eth0 -j FULLCONENAT
iptables -t nat -D POSTROUTING -o eth0 -j FULLCONENAT

Log

Mar 28 08:02:28 kernel: ------------[ cut here ]------------
Mar 28 08:02:28 kernel: invalid opcode: 0000 [#1] SMP
Mar 28 08:02:28 kernel: Modules linked in: xt_FULLCONENAT(O)
Mar 28 08:02:28 kernel: CPU: 0 PID: 31970 Comm: iptables Tainted: G           O    4.9.90-mod-std-ipv6-64 #1
Mar 28 08:02:28 kernel: Hardware name:                  /DN2800MT, BIOS MTCDT10N.86A.0165.2013.0114.1540 01/14/2013
Mar 28 08:02:28 kernel: task: ffff999469f18000 task.stack: ffffb77f00f60000
Mar 28 08:02:28 kernel: RIP: 0010:[<ffffffff8fc9982e>]  [<ffffffff8fc9982e>] nf_conntrack_unregister_notifier+0x3e/0x40
Mar 28 08:02:28 kernel: RSP: 0018:ffffb77f00f63cd8  EFLAGS: 00010297
Mar 28 08:02:28 kernel: RAX: ffff999469f18000 RBX: ffffffff906cd100 RCX: 0000000000000d18
Mar 28 08:02:28 kernel: RDX: ffffffffc0006040 RSI: ffffffffc00063c8 RDI: ffffffff906d2680
Mar 28 08:02:28 kernel: RBP: ffffb77f00f63ce8 R08: ffff999466a07a20 R09: 0000000000013000
Mar 28 08:02:28 kernel: R10: 0000000000080001 R11: 0000000000080001 R12: ffffffffc00063c8
Mar 28 08:02:28 kernel: R13: ffff999469ee3678 R14: ffffffff906cd100 R15: ffff999469ee3608
Mar 28 08:02:28 kernel: FS:  00007f9c1530f700(0000) GS:ffff99946fc00000(0000) knlGS:0000000000000000
Mar 28 08:02:28 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 28 08:02:28 kernel: CR2: 00005593492380b8 CR3: 0000000125e76000 CR4: 0000000000000670
Mar 28 08:02:28 kernel: Stack:
Mar 28 08:02:28 kernel: ffffb77f00f63d10 ffffffff906cd100 ffffb77f00f63d00 ffffffffc000407c
Mar 28 08:02:28 kernel: ffff999469ee3608 ffffb77f00f63d50 ffffffff8fd62acb ffffffff906cd100
Mar 28 08:02:28 kernel: ffffffffc0006040 ffff999469ee3698 0000000000000002 45275faebd9576ab
Mar 28 08:02:28 kernel: Call Trace:
Mar 28 08:02:28 kernel: [<ffffffffc000407c>] fullconenat_tg_destroy+0x2c/0x40 [xt_FULLCONENAT]
Mar 28 08:02:28 kernel: [<ffffffff8fd62acb>] cleanup_entry+0x7b/0xb0
Mar 28 08:02:28 kernel: [<ffffffff8fd632cb>] __do_replace+0x1ab/0x250
Mar 28 08:02:28 kernel: [<ffffffff8fd65865>] do_ipt_set_ctl+0x155/0x1c0
Mar 28 08:02:28 kernel: [<ffffffff8fc8b097>] nf_setsockopt+0x47/0x80
Mar 28 08:02:28 kernel: [<ffffffff8fd0d7c7>] ip_setsockopt+0x67/0x80
Mar 28 08:02:28 kernel: [<ffffffff8fd3128f>] raw_setsockopt+0x2f/0x40
Mar 28 08:02:28 kernel: [<ffffffff8fc02e25>] sock_common_setsockopt+0x15/0x20
Mar 28 08:02:28 kernel: [<ffffffff8fc01b73>] SyS_setsockopt+0x73/0xe0
Mar 28 08:02:28 kernel: [<ffffffff8f00250c>] do_syscall_64+0x5c/0xc0
Mar 28 08:02:28 kernel: [<ffffffff8fe9d57e>] entry_SYSCALL_64_after_swapgs+0x58/0xc6
Mar 28 08:02:28 kernel: Code: f4 e8 97 17 20 00 4c 39 a3 78 0d 00 00 75 1c 48 c7 83 78 0d 00 00 00 00 00 00 48 c7 c7 80 26 6d 90 e8 37 18 20 00 5b 41 5c 5d c3 <0f> 0b 55 48 89 e5 41 54 53 48 89 fb 48 c7 c7 80 26 6d 90 49 89 
Mar 28 08:02:28 kernel: RSP <ffffb77f00f63cd8>
Mar 28 08:02:28 kernel: ---[ end trace 90bfa6ce67cfdbc5 ]---

Thank you.

Chion82 commented 6 years ago

Hi, thanks for the feedback. Please provide more information about the crash because for now I cannot reproduce this issue.

4t0m1k commented 6 years ago

Thank you for your response :)

make sure you have pulled the latest commit by a simple git pull.

Yes, it is.

run export CFLAGS='-DDEBUG -Wall' before building the module, which enables verbose logging in dmesg. Please send me more dmesg logs :)

root@ns33*****:~# insmod netfilter-full-cone-nat/xt_FULLCONENAT.ko
root@ns33*****:~# iptables -t nat -A POSTROUTING -o eth0 -j FULLCONENAT
root@ns33*****:~# iptables -t nat -D POSTROUTING -o eth0 -j FULLCONENAT
Erreur de segmentation (segfault in french)
[   59.459427] xt_FULLCONENAT: loading out-of-tree module taints kernel.
[   66.194559] xt_FULLCONENAT: fullconenat_tg_check(): tg_refer_count is now 1
[   66.194562] xt_FULLCONENAT: fullconenat_tg_check(): ct_event_notifier registered
[   73.069929] xt_FULLCONENAT: fullconenat_tg_destroy(): tg_refer_count is now 0
[   73.070001] ------------[ cut here ]------------
[   73.070072] kernel BUG at net/netfilter/nf_conntrack_ecache.c:290!
[   73.070135] invalid opcode: 0000 [#1] SMP
[   73.070190] Modules linked in: xt_FULLCONENAT(O)
[   73.070347] CPU: 0 PID: 720 Comm: iptables Tainted: G           O    4.9.90-mod-std-ipv6-64 #1
[   73.070412] Hardware name:                  /DN2800MT, BIOS MTCDT10N.86A.0165.2013.0114.1540 01/14/2013
[   73.070480] task: ffff991fa6916780 task.stack: ffffb1d400940000
[   73.070540] RIP: 0010:[<ffffffff9dc9982e>]  [<ffffffff9dc9982e>] nf_conntrack_unregister_notifier+0x3e/0x40
[   73.070683] RSP: 0018:ffffb1d400943cd8  EFLAGS: 00010297
[   73.070752] RAX: ffff991fa6916780 RBX: ffffffff9e6cd100 RCX: 0000000000000000
[   73.070825] RDX: 0000000000000000 RSI: ffffffffc00173c8 RDI: ffffffff9e6d2680
[   73.070897] RBP: ffffb1d400943ce8 R08: 00000000000002ac R09: ffffffff9ea48c74
[   73.070969] R10: 0000000000080001 R11: 0000000000000001 R12: ffffffffc00173c8
[   73.071041] R13: ffff991fa5cfc678 R14: ffffffff9e6cd100 R15: ffff991fa5cfc608
[   73.071116] FS:  00007f92c9d88700(0000) GS:ffff991fafc00000(0000) knlGS:0000000000000000
[   73.071190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   73.071260] CR2: 000055fd682d80b8 CR3: 0000000129a2a000 CR4: 0000000000000670
[   73.071330] Stack:
[   73.071392]  ffffb1d400943d10 ffffffff9e6cd100 ffffb1d400943d00 ffffffffc00150a7
[   73.071651]  ffff991fa5cfc608 ffffb1d400943d50 ffffffff9dd62acb ffffffff9e6cd100
[   73.071908]  ffffffffc0017040 ffff991fa5cfc698 0000000000000002 95e2e354546bc71b
[   73.072164] Call Trace:
[   73.072240]  [<ffffffffc00150a7>] fullconenat_tg_destroy+0x57/0x70 [xt_FULLCONENAT]
[   73.072322]  [<ffffffff9dd62acb>] cleanup_entry+0x7b/0xb0
[   73.072397]  [<ffffffff9dd632cb>] __do_replace+0x1ab/0x250
[   73.072471]  [<ffffffff9dd65865>] do_ipt_set_ctl+0x155/0x1c0
[   73.072548]  [<ffffffff9dc8b097>] nf_setsockopt+0x47/0x80
[   73.072622]  [<ffffffff9dd0d7c7>] ip_setsockopt+0x67/0x80
[   73.072697]  [<ffffffff9dd3128f>] raw_setsockopt+0x2f/0x40
[   73.072770]  [<ffffffff9dc02e25>] sock_common_setsockopt+0x15/0x20
[   73.072846]  [<ffffffff9dc01b73>] SyS_setsockopt+0x73/0xe0
[   73.072922]  [<ffffffff9d00250c>] do_syscall_64+0x5c/0xc0
[   73.073000]  [<ffffffff9de9d57e>] entry_SYSCALL_64_after_swapgs+0x58/0xc6
[   73.073071] Code: f4 e8 97 17 20 00 4c 39 a3 78 0d 00 00 75 1c 48 c7 83 78 0d 00 00 00 00 00 00 48 c7 c7 80 26 6d 9e e8 37 18 20 00 5b 41 5c 5d c3 <0f> 0b 55 48 89 e5 41 54 53 48 89 fb 48 c7 c7 80 26 6d 9e 49 89 
[   73.076219] RIP  [<ffffffff9dc9982e>] nf_conntrack_unregister_notifier+0x3e/0x40
[   73.076345]  RSP <ffffb1d400943cd8>
[   73.076506] ---[ end trace 52d01a4fc6c67be3 ]---

Tell me if you need something more.

Did you use custom network namespaces? Support for multiple network namespaces has not been tested yet and might be buggy.

I don't know what it is, it's on a fresh install of linux on a dedicated server hosted by OVH that I got this error. ip netns list returns nothing if it is what you are talking about.

By the way I've tested using the interface name enp1s0 which is the real name (in ifconfig), but same result.

You can find my kernel config here : ftp://ftp.ovh.net/made-in-ovh/bzImage/4.9.90/config-4.9.90-mod-std-ipv6-64

Chion82 commented 6 years ago

This is weird. It seems that the binary instructions of the existing kernel space function nf_conntrack_unregister_notifier() is somehow corrupted, which results in invalid opcode: 0000.

This may be caused by a mismatch between the linux header version and the actual kernel version, or the kernel is not a standard stable build but somehow altered.

I'm trying to reconstruct the runtime environment by installing your specific debian version along with your kernel configurations into my virtual machines. This will take some time and meanwhile you can try another linux distribution or build a standard kernel.

Another question: regardless of the deletion of the iptables rules, is this module working as expected on your system?

4t0m1k commented 6 years ago

I was on a 4.9.87 kernel when I installed the server, but this kernel wasn't enabling module loading.

So I've installed this .deb:

ftp://ftp.ovh.net/made-in-ovh/bzImage/4.9.90/DEB/ovhkernel-4.9-mod-std-ipv6-headers_4.9.90-1_amd64.deb ftp://ftp.ovh.net/made-in-ovh/bzImage/4.9.90/DEB/ovhkernel-4.9-mod-std-ipv6-image_4.9.90-1_amd64.deb

But I haven't installed the corresponding libc-dev .deb, I will try with this package installed.

Maybe there is a conflict between the kernels 4.9.87 and 4.9.90 ?

Anyway, your module is working perfectly on my system. Connected OpenVPN clients have a positive result with stunserver (http://www.stunprotocol.org/) on linux (not on Windows but looking at Wireshark, it appears that the packets are correctly received, I think that it is a stunserver bug on Windows...)

4t0m1k commented 6 years ago

I found a solution.

It seems that with my kernel, the member net->ct.nf_conntrack_event_cb is already present when I add a iptables rule. So the call to nf_conntrack_register_notifier(par->net, &ct_event_notifier) fails (return -EBUSY). And then the call to nf_conntrack_unregister_notifier(par->net, &ct_event_notifier) crash because &ct_event_notifier != net->ct.nf_conntrack_event_cb.

Solution : Adding nf_conntrack_unregister_notifier(par->net, par->net->ct.nf_conntrack_event_cb); just before nf_conntrack_register_notifier(par->net, &ct_event_notifier); in the fullconenat_tg_check function.

Chion82 commented 6 years ago

Interesting. I didn't know that. It seems there can be only one nf_ct_event_notifier registered at a time within a single network namespace.

I did a search in the kernel source and I found nf_conntrack_netlink.c registers a notifier. By running modprobe ip_conntrack_netlink before inserting iptable rules, I can reproduce this invalid opcode: 0000 problem.

In my opinion, forcing an nf_conntrack_unregister_notifier() before nf_conntrack_register_notifier() is not a good idea because it might affect the netlink modules currently running on your system. Maybe you will get another kernel BUG invalid opcode when the netlink modules are being unloaded.

In this FULLCONENAT module, the nf_ct_event_notifier feature can be disabled safely. It is a mechanism to actively detect any outdated NAT mappings ( as we discussed before in #2 but in Chinese ), without which it may cause more memory consumption (but not a leak) and the performance might be slightly affected.

Later I will put a condition in this module to disable the notifier stuff accordingly when it's unavailable.

Anyway, thanks for your hacking to the source code. That really helps a lot.

4t0m1k commented 6 years ago

Glad to help ! Good luck and thank you for your work :)