Closed fuero closed 5 years ago
Thanks.
Any chance you could provide the nftables configuration? Sorry; can't say I've needed to interface with the nftables client before.
Even if it has placeholder IP addresses or whatever. My email is in my profile, if you want to keep it private.
BTW: Judging by the first stack trace, this might be the same bug as #279. There's apparently some workaround for it.
(But I'd still like to patch it properly.)
It's pretty straight-forward if only nat is relevant:
table ip nat {
chain prerouting {
type nat hook prerouting priority filter; policy accept;
tcp dport xxxx iifname "<wan-iface>" dnat to 192.168.0.4
}
chain postrouting {
type nat hook postrouting priority filter; policy accept;
oifname "<wan-iface>" masquerade
}
}
It's pretty straight-forward if only nat is relevant:
I'm just hoping.
Ok, working...
Can't reproduce yet. It really baffles me because, given that you don't have any nftables rules matching ICMP, I'd expect the ping to be completely untouched by the masquerading. Unless nftables performs some sort of evil ICMP magic behind the scenes.
There are a few things I haven't tried, but I do have a few questions:
192.168.1
, then why are you DNATting to 192.168.0
?priority filter
? (Which I think is the same as priority 0
) I think SNAT is normally 100, and DNAT is -100. (But this is probably irrelevant.)once I remove the IPv4 masquerading config it no longer dies but doesn't do anything.
2001:db8::4 -> <48-prefix>:2::192.168.1.7
. (This packet gets tunneled, but I'm not caring about this for now.)192.168.1.1 -> 192.168.1.7
(Assuming 192.168.1.1
is the router's address.)192.168.1.1 -> 192.168.0?.4
during postrouting.192.168.0?.4 -> 192.168.1.1
192.168.1.7 -> 192.168.1.1
<48-prefix>:2::192.168.1.7 -> 2001:db8::4
Are we on the same page?
Sorry for the confusion, I messed up replacing the IP address in my nftables listing - it's supposed to be 192.168.1.7 of course. I've sent a copy of my firewall and network config to you via email.
I'll try to recreate this in a VM-Lab setting.
This crash is being caused by IPv6 connection tracking data reaching the IPv4 NAT code. (Most likely, they are inherently incompatible with each other.)
(Note: I can't find an HTTP-available copy of CentOS's kernel source, so I'm linking kernel code to Linux 4.10 on LXR. I found it to be close enough to the version of the CentOS code I'm reading.)
To create the IPv4 version of an IPv6 packet, Jool 4 uses pskb_copy
.
pskb_copy -> __pskb_copy -> __pskb_copy_fclone -> copy_skb_header -> __copy_skb_header -> __nf_copy
__nf_copy
copies the nfct
("Netfilter Connection") information from the source (IPv6) packet to the "destination" (resulting, IPv4) packet.
Later on, Linux's NAT code infers a nf_conn
object from nfct
.
nf_nat_ipv4_fn -> nf_nat_alloc_null_binding -> __nf_nat_alloc_null_binding -> nf_nat_setup_info
nf_nat_setup_info
infers a "tuple" object (curr_tuple
) from the connection object...
nf_nat_setup_info -> get_unique_tuple -> __nf_nat_l4proto_find
...whose layer-3 protocol number is later used to dereference ~a layer-3 protocol NAT object~ an array of layer-4 protocol NAT objects. nf_nat_l4protos[family]
is probably the NULL
pointer that's causing the crash; there's no such thing as an IPv6 NAT implementation.
This has a pretty good chance of being the bug. It matches the stack trace and explains why I've had so much trouble reproducing it: My setup is completely clean of IPv6 connection tracking noise, and some level of it is required to non-zeroize nfct
.
And even if it's not the cause of the bug, I do think that Jool should clear nfct
instead of copying it. Whatever iptables/nftables does to the packet before reaching Jool is protocol-specific, so it should probably not survive the translation.
The downside is that zeroizing nfct
will probably prevent you from chaining NAT and NAT64 the way you want. A cleared nfct
means that the NAT code will short-circuit out early.
Chaining NAT and NAT64 would probably be possible if you enclose them in different network namespaces, though. I don't know how much of a pain it would be to configure.
I'll try ~clearing~ setting nfct
manually. If it crashes, I will upload a commit that will zeroize nfct
after pskb_copy
(along with a bunch of reference counting stuff that I can already tell needs to be done). If this prevents the crash, I think we're done.
ALL RIGHT. Finally managed to reproduce it. Wasn't so hard; I was just being a freaking idiot. All it takes is modprobing nf_conntrack_ipv6
and chaining an SNAT to Jool.
My hypothesis was correct, though the solution was not. I fixed it in the last commit, and it's already collapsed into master. Will probably release Jool 4.0.5 in a few days.
I'm still not sure if chaining Jool to an SNAT in the same namespace will yield the behavior you want, but at least it shouldn't crash anymore.
To recapitulate:
Jool is clearing the conntrack information now, which is needed by NAT. It does this because the conntrack information is Layer 3 protocol dependent. Althogh I'm not sure if this completely nullifies Jool's compatibility with NAT, because NAT appears to initialize lazily, so it's not exactly doing nothing. But I couldn't manage to actually change the source address via SNAT. But this could just be my own incompetence. (ie. Lack of familiarity with nftables.)
<comment deleted because I can't reproduce the claim anymore>
Sorry if I missed something in the docs. Nonetheless, if this is something I skipped over or missed, it's still a bug IMO that the kernel just crashes.
On to the details:
I'm trying to use Jool on my router (CentOS Linux release 7.6.1810 (Core), Jool 4.0.4)
It has a WAN interface with v4 connectivity and a 6in4 tunnel, with a /48 prefix delegated. On the LAN interface, theres the v4 192.168.1.0/24 segment. The router is set up to do SNAT/Masquerading for the IPv4 hosts behind it (using nftables).
I've set up jool with:
From outside my home network, I ping one of my boxes with:
This crashes the kernel immediately. Doesn't matter if I use iptables or netfilter.
I've investigated the issue, once I remove the IPv4 masquerading config it no longer dies but doesn't do anything.
Here's one of the crash dmesg outputs (the one where I used iptables in tandem with nftables):
Here's one with just netfilter involved: