Facilitating NAT traversal

zrm commented 10 years ago

One of the key failings of many existing consumer routers is that they make NAT traversal unnecessarily difficult for decentralized applications. I don't presently have your hardware so I can't test the exact behavior of your current implementation, but what I hope to do is to provide a description of the correct behavior for a NAT gateway that can allow decentralized applications to work correctly even when both peers are behind separate NATs. For example, a decentralized VoIP-over-WiFi app would be a great way to sell people on the benefits having Open WiFi widely available, but only if it has a high probability of actually working.

There are a set of RFCs describing the behavioral requirements for network address translators for different transport protocols, e.g. RFC4787 (UDP), RFC5382 (TCP), RFC5508 (ICMP). I am not going to repeat their entire contents; suffice it to say that verifying RFC compliance is good.

Of particular importance, the RFC4787-compliant behavior of a NAT translating an outgoing packet is effectively to create an automatic port mapping. For example, suppose that 192.168.1.2:2000 sends a packet through a NAT to 1.1.1.1:3000. The NAT should as a result create a mapping such that any packet from any IP address and any source port which is sent to the NAT's public IP address on port 2000 is forwarded to 192.168.1.2:2000, and maintain that mapping until there ceases to be any traffic using it for at least 120 seconds. (The mapping duration is only permitted to be shorter for the ports of specific protocols known not to require longer-duration mappings, notably port 53 for DNS.)

Now suppose that 192.168.1.3:2000 behind the same NAT sends a packet to 2.2.2.2:4000. The NAT should not send the outgoing packet from port 2000 on its public address because it already has an active mapping for that port for 192.168.1.2. Instead it should have to do port translation, but the mapping behavior should remain the same. When 192.168.1.3:2000 sends a packet to 2.2.2.2:4000, the NAT will translate it so as to be from its public address on some other port, say 2100, to 2.2.2.2:4000. The mapping behavior should be the same even though the mapped ports are being translated. Thus any packet from any IP address and any port which is sent to the NAT's public IP address on port 2100 should be forwarded to 192.168.1.3:2000.

There are two important reasons that this behavior needs to be correct. The first is that it conserves resources on the NAT device. Some decentralized applications like UDP distributed hashtables can send and receive packets from several thousands of separate peers but will typically send all packets from the same port (because that conserves sockets and ports on the endpoint device too). The correct mapping behavior allows a single mapping to be used for all of them. NAT devices that improperly attempt to maintain separate mappings for each individual session have been known to run out of resources and drop active mappings or crash outright.

The second reason is the one advanced by RFC4787. Making mappings consistent between addresses allows decentralized protocols to perform address and port detection. For example, when Alice sends a packet from 192.168.1.3:2000 to Bob at 2.2.2.2:4000, Bob receives the packet and can then tell Alice that her publicly visible address and port are 3.3.3.3:2100. Alice can then provide 3.3.3.3:2100 as her address and port to whatever address lookup mechanism is in use and Carol who may not yet know her own public address can then connect to Alice at 3.3.3.3:2100 even if they are both behind separate NATs.

When some people understand how the required mapping behavior works their gut reaction is that it would be a security problem but it largely isn't. The NAT mapping rules are not, nor are they supposed to be, firewall rules. Actual firewall rules (and sensible default firewall rules) are how you specify that you e.g. don't want to allow webservers on port 80 from the guest network. This is the same as what you do when using IPv6. Moreover, some systematic method of mapping an incoming port in the NAT is inherently necessary for decentralized IPv4 applications to function, because when you have two peers each behind a NAT, one of them has to be the one to receive the first packet.

Which leads to the question of NAT-PMP and UPnP. These are the two most popular protocols for applications to explicitly request a port mapping and learn the public IP address from a NAT gateway. Recall above that Alice can learn her public IP address by sending a packet to Bob and asking him where it came from. But this requires Alice to trust Bob. And in a completely distributed system, how does Bob learn his address and port to provide to Alice? There is a bootstrapping problem. Having some percentage of the peers map ports and learn addresses with NAT-PMP or UPnP solves it.

Ask anyone whether UPnP is secure. The answer is no. This has nothing to do with the ability for arbitrary devices to map arbitrary ports; that part is no more insecure than providing devices with IPv6 addresses. UPnP is just a terrible protocol. It is unnecessarily complicated which has led to a long and continuing history of insecure implementations. Do not enable it by default. Use your collective judgment whether to even support it at all as a manual option, but know that there do exist a nonzero number of legacy applications that support UPnP but not NAT-PMP.

NAT-PMP is a different matter entirely. It is a newer and much simpler protocol. I make the following recommendation for it. Provide four configurable options: Allowing mapping any port, allow mapping unprivileged ports >=1024, don't allow port mapping but support the protocol so that applications can learn the public IP address, and disable the protocol. Then as the default, allow mapping unprivileged ports on the guest network and any ports on the private network.

The reasoning is this. Allowing guests to run webservers on port 80 or DNS servers on port 53 is probably not desirable. You'll also want an equivalent default firewall rule to the same effect, but there is no sense having the NAT-PMP daemon tell the client it has the port when the firewall is going to block it anyway. Meanwhile, mapping privileged ports (especially TCP port 443) is sometimes the only way to establish communication with peers behind restrictive firewalls, so it will allow more applications to function if those ports can be mapped from at least the private network. That also has the effect of reserving those ports to the private network which seems desirable given their scarcity.

Rangak commented 9 years ago

See additional discussion in #246. Bumping up priority.

Rangak commented 9 years ago

Suggestion for whoever takes this issue up for implementation is that they first start an email thread on ow-tech@eff.org mailing list describing the solution they are proposing to implement. This is a sensitive area impinging on security and getting to consensus first over email will avoid need for rework.

EFForg / OpenWireless

Facilitating NAT traversal #222