NAT-Traversal Testing with testnet.polykey.io

CMCDragonkai commented 3 years ago

Specification

To automatically test for NAT-busting is a bit complex, you need to simulate the existence of multiple machines, and then simulate full-cone nat, restricted cone nat and then symmetric nat.

Since we don't have a relay proxy enabled yet, symmetric NATs is going to be left out for now. So we'll focus on all the NAT architectures except for symmetric NAT.

Additional context

Actually I don't think we should bother with QEMU or NixOS here. It's too complicated. QEMU might be good choice for being able to run the test cross-platform, but lacking expertise on QEMU here (I've already worked on it with respect to netboot work), and more experience with network namespaces should mean we can do this tests on just Linux. NixOS limits our environment even more and requires running in a NixOS environment.

Note that network namespaces with Linux stateful firewalls should be perfectly capable of simulating a port-restricted firewall.

Old context follows...

The best way to do this is with VM system using QEMU.

NixOS has a multi-machine testing system that can be used to do this, however such tests can only run on NixOS: https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests We have pre-existing code for this:

NixOS NAT Module Test

```nix # here is the testing base file: https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/testing-python.nix with import ../../pkgs.nix {}; let pk = (callPackage ../../nix/default.nix {}).package; in import { nodes = { privateNode1 = { nodes, pkgs, ... }: { virtualisation.vlans = [ 1 ]; environment.variables = { PK_PATH = "$HOME/polykey"; }; environment.systemPackages = [ pk pkgs.tcpdump ]; networking.firewall.enable = false; networking.defaultGateway = (pkgs.lib.head nodes.router1.config.networking.interfaces.eth1.ipv4.addresses).address; }; privateNode2 = { nodes, pkgs, ... }: { virtualisation.vlans = [ 2 ]; environment.variables = { PK_PATH = "$HOME/polykey"; }; environment.systemPackages = [ pk pkgs.tcpdump ]; networking.firewall.enable = false; networking.defaultGateway = (pkgs.lib.head nodes.router2.config.networking.interfaces.eth1.ipv4.addresses).address; }; router1 = { pkgs, ... }: { virtualisation.vlans = [ 1 3 ]; environment.systemPackages = [ pkgs.tcpdump ]; networking.firewall.enable = false; networking.nat.externalInterface = "eth2"; networking.nat.internalIPs = [ "192.168.1.0/24" ]; networking.nat.enable = true; }; router2 = { ... }: { virtualisation.vlans = [ 2 3 ]; environment.systemPackages = [ pkgs.tcpdump ]; networking.firewall.enable = false; networking.nat.externalInterface = "eth2"; networking.nat.internalIPs = [ "192.168.2.0/24" ]; networking.nat.enable = true; }; publicNode = { config, pkgs, ... }: { virtualisation.vlans = [ 3 ]; environment.variables = { PK_PATH = "$HOME/polykey"; }; environment.systemPackages = [ pk pkgs.tcpdump ]; networking.firewall.enable = false; }; }; testScript ='' start_all() # can start polykey-agent in both public and private nodes publicNode.succeed("pk agent start") privateNode1.succeed("pk agent start") privateNode2.succeed("pk agent start") # can create a new keynode in both public and private nodes create_node_command = "pk agent create -n {name} -e {name}@email.com -p passphrase" publicNode.succeed(create_node_command.format(name="publicNode")) privateNode1.succeed(create_node_command.format(name="privateNode1")) privateNode2.succeed(create_node_command.format(name="privateNode2")) # can add privateNode node info to publicNode publicNodeNodeInfo = publicNode.succeed("pk nodes get -c -b") privateNode1.succeed("pk nodes add -b '{}'".format(publicNodeNodeInfo)) privateNode2.succeed("pk nodes add -b '{}'".format(publicNodeNodeInfo)) # can add publicNode node info to privateNodes privateNode1NodeInfo = privateNode1.succeed("pk nodes get -c -b") privateNode2NodeInfo = privateNode2.succeed("pk nodes get -c -b") publicNode.succeed("pk nodes add -b '{}'".format(privateNode1NodeInfo)) publicNode.succeed("pk nodes add -b '{}'".format(privateNode2NodeInfo)) # copy public keys over to node machines publicNodePublicKey = publicNode.succeed("cat $HOME/.polykey/.keys/public_key") privateNode1PublicKey = privateNode1.succeed("cat $HOME/.polykey/.keys/public_key") privateNode2PublicKey = privateNode2.succeed("cat $HOME/.polykey/.keys/public_key") privateNode1.succeed("echo '{}' > $HOME/publicNode.pub".format(publicNodePublicKey)) privateNode1.succeed("echo '{}' > $HOME/privateNode2.pub".format(privateNode2PublicKey)) privateNode2.succeed("echo '{}' > $HOME/publicNode.pub".format(publicNodePublicKey)) privateNode2.succeed("echo '{}' > $HOME/privateNode1.pub".format(privateNode1PublicKey)) publicNode.succeed("echo '{}' > $HOME/privateNode1.pub".format(privateNode1PublicKey)) publicNode.succeed("echo '{}' > $HOME/privateNode2.pub".format(privateNode2PublicKey)) # modify node info to match node machines' host address publicNode.succeed("pk nodes update -p $HOME/privateNode1.pub -ch privateNode1") publicNode.succeed("pk nodes update -p $HOME/privateNode2.pub -ch privateNode2") privateNode1.succeed( "pk nodes update -p $HOME/publicNode.pub -ch publicNode -r $HOME/publicNode.pub" ) privateNode2.succeed( "pk nodes update -p $HOME/publicNode.pub -ch publicNode -r $HOME/publicNode.pub" ) # privateNodes can ping publicNode privateNode1.succeed("pk nodes ping -p $HOME/publicNode.pub") privateNode2.succeed("pk nodes ping -p $HOME/publicNode.pub") # can create a new vault in publicNode and clone it from both privateNodes publicNode.succeed("pk vaults new publicVault") publicNode.succeed("echo 'secret content' > $HOME/secret") publicNode.succeed("pk secrets new publicVault:Secret -f $HOME/secret") privateNode1.succeed("pk vaults clone -n publicVault -p $HOME/publicNode.pub") privateNode2.succeed("pk vaults clone -n publicVault -p $HOME/publicNode.pub") # can create a new vault in privateNode1 privateNode1.succeed("pk vaults new privateVault1") # can create a new secret in privateNode1 privateNode1.succeed("echo 'secret content' > $HOME/secret") privateNode1.succeed("pk secrets new privateVault1:Secret -f $HOME/secret") # setup a relay between privateNode1 and publicNode privateNode1.succeed("pk nodes relay -p $HOME/publicNode.pub") # add privateNode1 node info to privateNode2 privateNode1NodeInfo = privateNode1.succeed("pk nodes get -c -b") privateNode2.succeed("pk nodes add -b '{}'".format(privateNode1NodeInfo)) # add privateNode2 node info to privateNode1 privateNode2NodeInfo = privateNode2.succeed("pk nodes get -c -b") privateNode1.succeed("pk nodes add -b '{}'".format(privateNode2NodeInfo)) # can ping privateNode1 to privateNode2 privateNode2.succeed("pk nodes ping -p ~/privateNode1.pub") # can pull a vault from privateNode1 to privateNode2 privateNode2.succeed("pk vaults clone -p ~/privateNode1.pub -n privateVault1") ''; } ```

Tasks

[x] - Create test harness/fixture utilities that create a multi-node situation
[x] - Simulate a NAT table situation by making use of network namespaces
[x] - This test can only run on Linux that supports virtual network namespaces. 4. [ ] - The test will have to be run separately from npm test which runs jest. This test can be done inside Gitlab CI/CD if the CI/CD on Linux supports creating network namespaces. If not, it's a manual test. Using conditional testing instead https://github.com/MatrixAI/js-polykey/issues/380
[x] - Review my gist https://gist.github.com/CMCDragonkai/3f3649d7f1be9c7df36f which explains how to use network namespaces. The Linux iptables firewall has to be used that simulates a NAT that allows outgoing packets but denies incoming packets except for the connections that are already live. This is called a "stateful firewall". I've done this before, but I forgot the details.
[x] - You'll need to use https://stackabuse.com/executing-shell-commands-with-node-js/ to run the ip netns commands. Remember to check whether the OS is linux before allowing one to run these tests.
~[ ] - Add in testing involving testnet.polykey.io which should run only during integration testing after the integration:deployment job (because it has to deploy to the testnet in that job).~ - Reissued MatrixAI/Polykey-CLI#71

CMCDragonkai commented 3 years ago

This should be incorporated into our automated tests when we run jest. But that would also mean using nix-build... etc. Or it can be done outside as a separate command that is only run in our checkPhase during the building of the application/library (possibly in our release.nix since our default.nix doesn't have this).

CMCDragonkai commented 3 years ago

We need to implement a test for relaying the hole punching message. This is not meant to be using the notifications domain because it's part of automated connection establishment.

We need to test several situations:

Test if we can do this with a designated seed node, this coincides with MatrixAI/Polykey#194 <- only this one for release
Test if we can do this with any node on the Polykey network, thus generalising to decentralised relays.

CMCDragonkai commented 3 years ago

@joshuakarp I'm curious how exactly are we going to implement a optimised routing system for routing the hole punching relay messages?

The same algorithm can later be used for picking the optimal relay for mesh proxying to defeat symmetric NAT.

I remember we mentioned some usage of kademlia or represenation of "closest" node.

If we just assume that we always use our seed cluster/bootstrap cluster, then this is just centralised routing. But if we enable any keynode to be a relay, then we need to understand that the the PK network is a loose mesh, with loose connection lifetimes as well. Is kademlia actually useful for routing here?

This feels like a routing problem, and it seems that existing routers already have algorithms that help solve this problem. Is there any cross over with things like spanning tree algorithms https://en.wikipedia.org/wiki/Minimum_spanning_tree?

Given that we all keynodes may be on the public internet. Another matter is whether all live network links are equal in quality. Of course in reality they are not where latency and throughput and reliability matters. But if we are only distinguishing between vertexes where edges can be made vs vertexes where edges cannot be made, then our algorithm should converge very quickly to find the proper relaying route.

joshuakarp commented 3 years ago

@CMCDragonkai Kademlia inherently has a "closeness" mechanism. That is, the XOR value of two node IDs determine closeness (smaller = closer, larger = further away). Remember that with Kademlia, we store more node ID -> node address mappings of the nodes that are "closest" to us: this is the fundamental part of the k-buckets structure.

Isn't this inherently a routing solution?

See this too, straight from the Kademlia paper https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf:

We start with some definitions. For a k-bucket covering the distance range [2^i, 2^(i+1)), define the index of the bucket to be i. Define the depth, h, of a node to be [number of k buckets] − i, where i is the smallest index of a non-empty bucket. Define node y’s bucket height in node x to be the index of the bucket into which x would insert y minus the index of x’s least significant empty bucket. Because node IDs are randomly chosen, it follows that highly non-uniform distributions are unlikely. Thus with overwhelming probability the height of any given node will be within a constant of log n for a system with n nodes. Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.

Our next step will be to assume the invariant that every k-bucket of every node contains at least one contact if a node exists in the appropriate range. Given this assumption, we show that the node lookup procedure is correct and takes logarithmic time. Suppose the closest node to the target ID has depth h. If none of this node’s h most significant k-buckets is empty, the lookup procedure will find a node half as close (or rather whose distance is one bit shorter) in each step, and thus turn up the node in h − log k steps. If one of the node’s k-buckets is empty, it could be the case that the target node resides in the range of the empty bucket. In this case, the final steps will not decrease the distance by half. However, the search will proceed exactly as though the bit in the key corresponding to the empty bucket had been flipped. Thus, the lookup algorithm will always return the closest node in h − log k steps.

I found a pretty good animation of this too, to showcase the lookup procedure https://kelseyc18.github.io/kademlia_vis/lookup/

As a side note, I started to read quite an interesting paper about using notions of "trust" to overcome some of the issues with malicious nodes and attack vectors on these kinds of systems: https://ieeexplore.ieee.org/document/6217954

CMCDragonkai commented 3 years ago

Kademlia's closeness is used to route to the relevant node that has information on the node ID to IP address. I can see how that might mean that you can trigger a hole punch relay message at that node.

Does this mean you would need to send that as a option/flag that means you want to pass on a hole punching message on the call to resolve a node ID? This would mean resolution and relaying a hole punch is done at the same time.

Or you would need to know which node returned the resolution and then use that.

However there's still a problem with this mechanism. The relaying node must already have an open connection with the receiving node. If the relaying node does not have an open and live connection and that the receiving node is behind a restricted NAT, then the relaying cannot actually relay anything just like the sending node.

There is an assumption here that the node that resolves has an open connection to all the IP addresses. But is this actually true? There are several points here:

Relaying node must maintain an open and already have a live connection to the receiving node. Thus you want to route a relay message to a node that is open to it.
Sending node must be able to open a connection to the relaying node otherwise you have a chicken or egg problem here. A transitive NAT traversal problem.
Kademlia doesn't have a locality optimisation based on network locality for throughput nor latency. But this can solved later.
Seed/bootstrap nodes is the best candidate at the moment for relaying but if we want to decentralised this, this should work as a mesh.
Participating as part of the mesh should be optional... Or if not then all relay messages should ideally not leak which PK node is contacting which PK node. Which sounds like an onion routing scheme.

CMCDragonkai commented 3 years ago

Is the kademlia contact database rebalanced/replicated across the network like a DHT?

Otherwise how does one store a contact if not by being contacted by it and contacting it in turn?

joshuakarp commented 3 years ago

Does this mean you would need to send that as a option/flag that means you want to pass on a hole punching message on the call to resolve a node ID? This would mean resolution and relaying a hole punch is done at the same time.

In order for Kademlia to function, there are lots of implicit connection establishments taking place. That is, every time you receive k closest nodes from another node, the idea is that you would connect to each of these received nodes and query them for their k closest nodes. If you don't already have a connection established with them, then you need to send a hole-punching packet across the network to attempt to establish connection.

So yes, as part of the resolution process, we are already sending hole punch packets to each of these nodes we need to contact.

Or you would need to know which node returned the resolution and then use that.

This could be a worthwhile optimisation.

However there's still a problem with this mechanism. The relaying node must already have an open connection with the receiving node. If the relaying node does not have an open and live connection and that the receiving node is behind a restricted NAT, then the relaying cannot actually relay anything just like the sending node.

There is an assumption here that the node that resolves has an open connection to all the IP addresses. But is this actually true?

Yeah, you're right. I remember we had some brief discussion about whether we should consider having "persistent" connections to some of the "closest" nodes in the network. That is, upon coming online, we immediately connect to these nodes. But yeah, in order to even establish these persistent connections, we have the same issue.

joshuakarp commented 3 years ago

Is the kademlia contact database rebalanced/replicated across the network like a DHT?

Otherwise how does one store a contact if not by being contacted by it and contacting it in turn?

Currently no. There's no rebalancing/replication across the network. There's currently 2 ways that nodes are added to the database:

This kademlia "discovery" process, of contacting other nodes to find the k closest nodes (any found nodes that are able to be connected to are added to our database).
I added an initial "sync" to a node when it comes online. That is, it contacts the provided seed nodes (if any are provided) and asks for the k closest nodes to itself. Currently, this also attempts to establish connection before adding the node to our database.

CMCDragonkai commented 3 years ago

I think our plan is for the release, we'll stick with the centralised seed node cluster MatrixAI/Polykey#194.

We can put the problem of decentralised relaying to a post-release issue. This issue is more focused on just creating a test-harness for NAT-traversal, so we should focus this issue on this problem.

In the mean time, I'll create a new issue for decentralised relaying.

CMCDragonkai commented 3 years ago

If you have difficulties working on this, I can ask @nzhang-zh or @Zachaccino to help advise.

CMCDragonkai commented 2 years ago

Our tests here should probably change to be manual as soon as MatrixAI/Polykey#194 is done and then figure out how to automate these tests.

joshuakarp commented 2 years ago

Start date changed from Nov 15th to Nov 19th (based on delays in MatrixAI/Polykey#231).

joshuakarp commented 2 years ago

Start date changed from Friday Nov 19th to Tuesday Nov 23rd (delays in MatrixAI/Polykey#269, MatrixAI/Polykey#231, and CLI MR on Gitlab).

joshuakarp commented 2 years ago

Start date changed from Tuesday Nov 23rd to Monday Dec 6th (delayed from refactoring work in MatrixAI/Polykey#283).

joshuakarp commented 2 years ago

Removing this from MatrixAI/Polykey#291 as it should be closed as part of the testnet deployment (#194).

CMCDragonkai commented 2 years ago

These tests must be written outside or separately from src/tests. This way npm test does not run the NAT traversal testing. This is because NAT traversal testing may require a real network (when going to the seed nodes) or require OS simulation of NAT. A couple solutions here:

Create a separate tests-nat directory - disadvantage here is that you lose all your existing jest context and utilities, but you have to configure it again
Use https://jestjs.io/docs/cli#--testpathignorepatternsregexarray if the we use something like tests/nat as a subdirectory - this is advantageous for re-using all the same jest context, but just means we have to configure jest to ignore by default these tests, which maybe done in package.json or jest.config.js.

It's best to continue using our jest tooling for these tests, but if we need to use OS simulation, then the jest tests may need to be executing shell commands which then encapsulate scripts that run inside a network namespaces.

CMCDragonkai commented 2 years ago

This issue requires more deeper specifications, that work out all the different cases being tested. It's going to depend on the resolution of MatrixAI/Polykey#326 as that will finish the testnet deployment. These test cases may use the testnet.polykey.io.

CMCDragonkai commented 2 years ago

Some ideas for initial cases..

These cases do not have a signalling server. I.e. no seed node involved in coordination.

Node1 connect to Node2 - basic sanity test
Node1 behind NAT connects to Node2 - here Node1 is acting like a client and it is behind a NAT, connecting to an open Node2 that isn't behind NAT
Node1 connects to Node2 behind NAT - here Node1 is acting like a client, connecting to a closed Node2 that is behind a NAT
Node1 behind NAT connects to Node2 behind NAT - here Node1 is acting like a client and it is behind NAT, and it is connecting to Node2 which is also behind NAT

For the NAT, we need to simulate the 4 types:

Port restricted
Address restricted
Full cone
Symmetric

I'm not sure if our Linux netns and firewall can simulate all 4, but it should be able to do at the very least port restricted.

These cases do have a signalling server:

Node1 connect to node2
Node1 behind NAT connects to Node2 - here Node1 is acting like a client and it is behind a NAT, connecting to an open Node2 that isn't behind NAT
Node1 connects to Node2 behind NAT - here Node1 is acting like a client, connecting to a closed Node2 that is behind a NAT
Node1 behind NAT connects to Node 2 behind NAT - here Node1 is acting like a client and it is behind NAT, and it is connecting to Node2 which is also behind NAT

The signalling server, is enabled by having both node1 and node2 already connected to the seed node. That seed node should then be relaying connection request messages.

That should be enough for now. No TURN relay testing yet.

Note that some tests are expected to "fail", in that we want to test what the expected exceptional behaviour handling is. Like when the nodes cannot connect, how do we communicate this to the end user.

CMCDragonkai commented 2 years ago

In order to create these network namespaces, you have to use both ip and iptables commands to simulate the NAT architectures we're looking for. The gist guide https://gist.github.com/CMCDragonkai/3f3649d7f1be9c7df36f provides an example of the sort of things that will be called from the jest tests.

This also means when we actually do the tests, the tests will be done with pkSpawn or pkExpect, pkExec. These tests are high level tests, they don't import things inside the src/ codebase. It's all about using the pk command line, and running them inside the network namespaces. Which means they are similar to tests/bin.

emmacasolin commented 2 years ago

NAT Types

There are four types of NAT that we need to simulate in our tests in order to test our NAT traversal. These are:

Full Cone
Restricted Cone
Port-Restricted Cone
Symmetric

These four types can be categorised across two vertices: the type of NAT mapping used and the type of firewall used:

All of these NAT types incorporate a stateful firewall, the difference being how the stateful firewall behaves.

An endpoint-independent firewall will allow inbound packets from any ip:port, provided we have sent any oubound packet in the past. An endpoint-dependent firewall will only allow inbound packets from an ip that we have sent an outbound packet to in the past (or ip:port in some cases).

However, ultimately solutions for endpoint-independent NAT mapping (simultaneous transmission) will work for all types of firewalls, so we can group full cone, restricted cone, and port-restricted cone NAT together. So what we're really looking at is endpoint-independent NAT mapping vs endpoint-dependent NAT mapping. In other words, whether our address mapping that the outside world can use to communicate with us through our NAT is the same for everyone trying to communicate with us, or whether it changes.

For endpoint-independent NAT, all we need to do is query a server to find out what our address looks like to them, and then clients that want to communicate with us just need to send packets to that same address for them to reach us through our NAT. It gets more complicated for endpoint-dependent NAT.

CMCDragonkai commented 2 years ago

Simulating the firewall requires setting up iptables rules. Those rules will basically create a NAT setup. Here's an example https://www.karlrupp.net/en/computer/nat_tutorial.

I had set it up previously in our gist, but different NAT will require slightly different iptables config. Of course all of this must be done in a network namespace.

emmacasolin commented 2 years ago

Simulating NAT using `iptables`

iptables allows us to create complex rules for the modification and filtering of outgoing packets. It is comprised of three chains, however the two important ones are PREROUTING and POSTROUTING.

In the context of the iptables call structure, these chains are specified in the command component, which is essentially what type of rule we want to create (and for which chain):

# Abstract structure of an iptables instruction:
iptables [-t table] command [match pattern] [action]

The [-t table] component will always be -t nat for working with NAT, since we need to modify the nat table, the [match pattern] component corresponds to the type of packets the command should deal with, and the action component specifies what to do with matched packets.

For example, the call

iptables --table nat --append POSTROUTING --protocol tcp --source 192.168.1.2 --jump SNAT --to-source 194.236.50.155-194.236.50.160:1024-32000

Can be broken down as follows:

Command:

--append POSTROUTING/-A POSTROUTING - append this rule to the POSTROUTING chain

Match pattern:

--protocol tcp/-p tcp - match packets using the TCP protocol
--source 192.168.1.2/-s 192.168.1.2 - match packets whose source ip address is 192.168.1.2

Action:

--jump SNAT/-j SNAT - specify SNAT as our desired action, which changes the source ip address of the packet
--to-source 194.236.50.155-194.236.50.160:1024-32000 - a range of addresses we are allowing as options for the source ip address to be mapped to

There are a large combination of possible options, this iptables tutorial goes over all of them: https://www.frozentux.net/iptables-tutorial/iptables-tutorial.html

emmacasolin commented 2 years ago

This forum has some examples for using iptables to simulate the four types of NAT: https://forums.gentoo.org/viewtopic-t-826825.html, however after cross referencing with the iptables tutorial I don't think all of them are completely correct. The following is what I've been able to come up with by combining suggestions from a couple of different sources, as well as my own research and the NAT wikipedia page. All of these still need to be tested though.

For all of the below commands we define the following variables:

# Our internal, private address
addr_int="10.0.0.1"

# External address of our router
addr_ext="192.168.2.170"

# Port (using same for private and router, as well as for external hosts)
port="55555"

Full-cone NAT

Once an internal address (iAddr:iPort) is mapped to an external address (eAddr:ePort), any packets from iAddr:iPort are sent through eAddr:ePort
- iptables -t nat POSTROUTING -o $addr_int --sport $port -j SNAT --to-source $addr_ext
Any external host can send packets to iAddr:iPort by sending packets to eAddr:ePort
- iptables -t nat PREROUTING -i $addr_ext --dport $port -j DNAT --to-destination $addr_int (this command handles the translation from external address back to internal address)

(Address)-restricted-cone NAT

Once an internal address (iAddr:iPort) is mapped to an external address (eAddr:ePort), any packets from iAddr:iPort are sent through eAddr:ePort
- iptables -t nat POSTROUTING -o $addr_ext --sport $port -j SNAT --to-source $addr_ext (same as above)
An external host (hAddr:any) can send packets to iAddr:iPort by sending packets to eAddr:ePort only if iAddr:iPort has previously sent a packet to hAddr:any. "Any" means the port number doesn't matter
- iptables -t nat PREROUTING -i $addr_ext --dport $port -j DNAT --to-destination $addr_int (same as above)
- Here we also need a command that only allows packets from ip addresses we've comminucated with previously WITHOUT checking the port as well. So far I haven't come across a way to do this properly, so the closest thing would be choosing a specific address we want to receive from in our tests and check that incoming packets have the correct source ip address (no check for source host), so something like iptables -A INPUT -i $addr_ext -s ! $SOURCE_HOST --dport $port -j DROP

Port-restricted cone NAT

Once an internal address (iAddr:iPort) is mapped to an external address (eAddr:ePort), any packets from iAddr:iPort are sent through eAddr:ePort
- iptables -t nat POSTROUTING -o $addr_ext --sport $port -j SNAT --to-source $addr_ext (same as above)
An external host (hAddr:hPort) can send packets to iAddr:iPort by sending packets to eAddr:ePort only if iAddr:iPort has previously sent a packet to hAddr:hPort
- iptables -t nat PREROUTING -i $addr_ext --dport $port -j DNAT --to-destination $addr_int (same as above)
- iptables -A INPUT -i $addr_ext --dport $port -m state --state ESTABLISHED,RELATED -j ACCEPT (accept packets from addresses we've communicated with in the past)
- iptables -A INPUT -i $addr_ext --dport $port -m state --state NEW -j DROP (and drop packets from those we haven't)

Symmetric NAT

The combination of one internal IP address plus a destination IP address and port is mapped to a single unique external source IP address and port; if the same internal host sends a packet even with the same source address and port but to a different destination, a different mapping is used
- iptables -t nat -I POSTROUTING -s $addr_int -o $addr_ext -j MASQUERADE
Only an external host that receives a packet from an internal host can send a packet back
- iptables -A INPUT -i $addr_ext --dport $port -m state --state ESTABLISHED,RELATED -j ACCEPT (same as above)
- iptables -A INPUT -i $addr_ext --dport $port -m state --state NEW -j DROP (same as above)

Note that any of the above rules that are set for the NAT table (-t nat) control the type of NAT mapping, and the rest control the behaviour of the firewall.

emmacasolin commented 2 years ago

Linux network namespaces

For our tests we need to create at least two linux namespaces: one for Router1 and one for Router2. These namespaces would have iptables rules to simulate different types of NAT for different tests. Some tests may not actually need both Router1 and Router2 depending upon whether or not we want to simulate both of the nodes being behind NAT. We may also need to create namespaces for the two nodes so that they can communicate with the routers, however I'm not sure about this.

For our tests where we're not using the seed node, the two routers will need to be connected via a virtual ethernet (veth) cable. Otherwise they will be communicating through the seed node so both routers just need to be connected to the same seed node but not eachother.

We can create these namespaces and veth connections using Node.js's child_process module (most likely exec() will be fine) inside our jest tests. The only thing I'm not sure about is whether it will be possible to call CLI commands using exec() that need root permissions so this is something I'll need to prototype, but other than that the commands for setting up the namespaces and veth connections will be something like:

# Create namespaces
ip netns add Node1
ip netns add Node2
ip netns add Router1
ip netns add Router2

# Create veths
ip link add veth0-r1 type veth peer name veth0-r2
ip link add veth1-n1 type veth peer name veth1-r1
ip link add veth2-r2 type veth peer name veth2-n2

# Link up the ends of the veths to the correct namespaces
ip link set veth0-r1 netns Router1
ip link set veth0-r2 netns Router2
ip link set veth1-n1 netns Node1
ip link set veth1-r1 netns Router1
ip link set veth2-r2 netns Router2
ip link set veth2-n2 netns Node2

tegefaulkes commented 2 years ago

Just leaving a note for reference. As you saw during our meeting, using the ping command can give us a false positive. if you want some flexibility when testing the net namespace setup then use the netcat command I told you about.

// listening on TCP port 55555
nc -l 55555

// you can listen on udp
nc -l -u 55555

// you can log some information about the connection using the `-v` flag
// This will print the ip and port of the connection.
nc -l -v 55555

// You can connect the listening instance by using
nc ip port
nc -u ip port // on UDP
nc -v ip port // with logging

// The -k flag will keep the server side listening
nc -vlk 55555
// combo with this to quickly get the connection info.
// This will send `HelloWorld` and close connection after 1 second.
echo HelloWorld | nc -v -w 1 IP 55555

emmacasolin commented 2 years ago

After doing some prototyping today, I'm now able to setup two nodes bethind two routers (four network namespaces) and have node 1 and node 2 be able to ping each other. The next step will be adding iptables rules to the routers to simulate nat, but for now this is how I'm setting everything up before that point:

# Create four network namespaces
sudo ip netns add node1
sudo ip netns add node2
sudo ip netns add router1
sudo ip netns add router2

# Create veth interfaces to connect the namespaces such that we have
# node1 <-veth1-> router1 <-veth3-> router2 <-veth2-> node2
sudo ip link add veth1-n1 type veth peer name veth1-r1
sudo ip link add veth2-n2 type veth peer name veth2-r2
sudo ip link add veth3-r1 type veth peer name veth3-r2

# Connect up the ends to the correct namespaces
sudo ip link set veth1-r1 netns router1
sudo ip link set veth1-n1 netns node1
sudo ip link set veth2-r2 netns router2
sudo ip link set veth2-n2 netns node2
sudo ip link set veth3-r1 netns router1
sudo ip link set veth3-r2 netns router2

# Bring up loopback and the veth interfaces for all of the namespaces
sudo ip netns exec node1 ip link set lo up
sudo ip netns exec node1 ip link set veth1-n1 up
sudo ip netns exec node2 ip link set lo up
sudo ip netns exec node2 ip link set veth2-n2 up
sudo ip netns exec router1 ip link set lo up
sudo ip netns exec router1 ip link set veth1-r1 up
sudo ip netns exec router1 ip link set veth3-r1 up
sudo ip netns exec router2 ip link set lo up
sudo ip netns exec router2 ip link set veth2-r2 up
sudo ip netns exec router2 ip link set veth3-r2 up

# Create subnets for the veth interfaces such that we have
# node1 1.1.1.1 <-> 1.1.1.2 router1
# router1 3.3.3.1 <-> 3.3.3.2 router2
# router2 2.2.2.1 <-> 2.2.2.2 node2
# Note that the subnets have to share the first three numbers in order for the communication to work
sudo ip netns exec node1 ip addr add 1.1.1.1/24 dev veth1-n1
sudo ip netns exec router1 ip addr add 1.1.1.2/24 dev veth1-r1
sudo ip netns exec router1 ip addr add 3.3.3.1/24 dev veth3-r1
sudo ip netns exec router2 ip addr add 3.3.3.2/24 dev veth3-r2
sudo ip netns exec router2 ip addr add 2.2.2.1/24 dev veth2-r2
sudo ip netns exec node2 ip addr add 2.2.2.2/24 dev veth2-n2

# At this point everything should be able to communicate with its "neighbours" but we need to set the default routes to allow the rest of the namespaces to communicate
# Node1 should default to the interface on Router1 that it's connected to via veth1
sudo ip netns exec node1 ip route add default via 1.1.1.2
# Router1 should default to the interface on Router2 that it's connected to via veth3
sudo ip netns exec router1 ip route add default via 3.3.3.2
# Router2 should default to the interface on Router1 that it's connected to via veth3
sudo ip netns exec router2 ip route add default via 3.3.3.1
# Node2 should default to the interface on Router2 that it's connected to via veth2
sudo ip netns exec node2 ip route add default via 2.2.2.1

After running all of these commands, we should be able to ping 2.2.2.2 (Node2) from Node1 (and vice versa).

CMCDragonkai commented 2 years ago

To be able to do Nat simulation beyond full cone, you need a stateful firewall. In iptables this is known as conntrack. Have a look at conntrack and stateful iptables.

CMCDragonkai commented 2 years ago

I believe that with node 1 and router 1 they can all share the same namespace.

This is because the network namespace creates its own private network, and both Node 1 and Router 1 are on the same private network.

However it may be better for you to test with 4 namespaces first and then see how you can optimise just down to 2.

CMCDragonkai commented 2 years ago

If your commands will require sudo permissions, then you can run the jest test script as sudo. For example sudo npm test. However if your need dependencies from the nix shell to be sudo then sudo nix-shell is also possible.

Do note that any files created will be in root ownership so it's important that any temporary files created are deleted.

CMCDragonkai commented 2 years ago

Also any command using route or ifconfig should be using ip ... commands because the former 2 are being deprecated.

emmacasolin commented 2 years ago

To be able to do Nat simulation beyond full cone, you need a stateful firewall. In iptables this is known as conntrack. Have a look at conntrack and stateful iptables.

From my research I think these iptables rules should replicate a stateful firewall:

# External address of our router
addr_ext="192.168.2.170"
# Port (using same for private and router, as well as for external hosts)
port="55555"

iptables -A INPUT -i $addr_ext --dport $port -m state --state ESTABLISHED,RELATED -j ACCEPT (accept packets from addresses we've communicated with in the past)
iptables -A INPUT -i $addr_ext --dport $port -m state --state NEW -j DROP (and drop packets from those we haven't)

I remember having a quick look a conntrack and it didn't seem like the right thing to use, but I can have another look at it.

emmacasolin commented 2 years ago

I believe that with node 1 and router 1 they can all share the same namespace.

This is because the network namespace creates its own private network, and both Node 1 and Router 1 are on the same private network.

However it may be better for you to test with 4 namespaces first and then see how you can optimise just down to 2.

Hmm yeah that might work. I'll keep prototyping with four for now but that could be something to look into later.

I found this which might be useful for setting up namespaces that contain multiple hosts with a router: https://github.com/mininet/mininet

emmacasolin commented 2 years ago

I'm in the process of testing iptables rules to see if the NAT is working correctly, however I'm finding it hard to test for this. I was wanting to use wireshark but I can't open it from inside a namespace. I tried using nsenter to do this but it doesn't seem to be working.

tegefaulkes commented 2 years ago

You can change the net namespace of a program using

ip netns attach NAME PID - create a new named network namespace

              If NAME is available in /var/run/netns this command attaches the network namespace of the process PID to NAME as if it were created with ip netns.

But I'm not sure how well it will work with wireshark. Alternatively you can use tcpdump or netcat for simple testing.

emmacasolin commented 2 years ago

This test.ts is currently correctly setting up four namespaces (node1 <-> router1 <-> router2 <-> node2) where node1 and node2 are able to ping each other (and get a response back) by communicating through the two routers. I've been testing trying to add rules to the nat table for router1 in order to simulate full-cone NAT, however from running simple tests in the kernel it doesn't look like the rules are performing correctly at this stage, so this is something I'll need to keep prototyping.

```ts import { exec } from "child_process"; async function main() { // Namespaces const netnsn1 = 'node1'; const netnsn2 = 'node2'; const netnsr1 = 'router1'; const netnsr2 = 'router2'; // Veth cables (ends) const n1ToR1 = 'veth1-n1'; const r1ToN1 = 'veth1-r1'; const r2ToN2 = 'veth2-r2'; const n2ToR2 = 'veth2-n2'; const r1ToR2 = 'veth3-r1'; const r2ToR1 = 'veth3-r2'; // Subnets const n1ToR1Subnet = '1.1.1.1'; const r1ToN1Subnet = '1.1.1.2'; const r2ToN2Subnet = '2.2.2.1'; const n2ToR2Subnet = '2.2.2.2'; const r1ToR2Subnet = '3.3.3.1'; const r2ToR1Subnet = '3.3.3.2'; // Subnet mask const subnetMask = '/24'; // Logger for exec commands const logger = (error, stdout, stderr) => { if (error) { console.log(`error: ${error.message}`); return; } if (stderr) { console.log(`stderr: ${stderr}`); return; } console.log(`stdout: ${stdout}`); } // Create network namespaces for two nodes with NAT routers exec(`ip netns add ${netnsn1}`, logger); exec(`ip netns add ${netnsn2}`, logger); exec(`ip netns add ${netnsr1}`, logger); exec(`ip netns add ${netnsr2}`, logger); // Create veth pairs to link the namespaces exec(`ip link add ${n1ToR1} type veth peer name ${r1ToN1}`, logger); exec(`ip link add ${r2ToN2} type veth peer name ${n2ToR2}`, logger); exec(`ip link add ${r1ToR2} type veth peer name ${r2ToR1}`, logger); // Link up the veth pairs to the correct namespaces exec(`ip link set ${n1ToR1} netns ${netnsn1}`, logger); exec(`ip link set ${n2ToR2} netns ${netnsn2}`, logger); exec(`ip link set ${r1ToN1} netns ${netnsr1}`, logger); exec(`ip link set ${r1ToR2} netns ${netnsr1}`, logger); exec(`ip link set ${r2ToN2} netns ${netnsr2}`, logger); exec(`ip link set ${r2ToR1} netns ${netnsr2}`, logger); // Loopback and veths are down by default - get them running exec(`ip netns exec ${netnsn1} ip link set lo up`, logger); exec(`ip netns exec ${netnsn1} ip link set ${n1ToR1} up`, logger); exec(`ip netns exec ${netnsn2} ip link set lo up`, logger); exec(`ip netns exec ${netnsn2} ip link set ${n2ToR2} up`, logger); exec(`ip netns exec ${netnsr1} ip link set lo up`, logger); exec(`ip netns exec ${netnsr1} ip link set ${r1ToN1} up`, logger); exec(`ip netns exec ${netnsr1} ip link set ${r1ToR2} up`, logger); exec(`ip netns exec ${netnsr2} ip link set lo up`, logger); exec(`ip netns exec ${netnsr2} ip link set ${r2ToN2} up`, logger); exec(`ip netns exec ${netnsr2} ip link set ${r2ToR1} up`, logger); // Create subnets for the veth pairs to communicate over exec(`ip netns exec ${netnsn1} ip addr add ${n1ToR1Subnet}${subnetMask} dev ${n1ToR1}`, logger); exec(`ip netns exec ${netnsn2} ip addr add ${n2ToR2Subnet}${subnetMask} dev ${n2ToR2}`, logger); exec(`ip netns exec ${netnsr1} ip addr add ${r1ToN1Subnet}${subnetMask} dev ${r1ToN1}`, logger); exec(`ip netns exec ${netnsr1} ip addr add ${r1ToR2Subnet}${subnetMask} dev ${r1ToR2}`, logger); exec(`ip netns exec ${netnsr2} ip addr add ${r2ToN2Subnet}${subnetMask} dev ${r2ToN2}`, logger); exec(`ip netns exec ${netnsr2} ip addr add ${r2ToR1Subnet}${subnetMask} dev ${r2ToR1}`, logger); // Setup the defalt routes for each namespace exec(`ip netns exec ${netnsn1} ip route add default via ${r1ToN1Subnet}`, logger); exec(`ip netns exec ${netnsn2} ip route add default via ${r2ToN2Subnet}`, logger); exec(`ip netns exec ${netnsr1} ip route add default via ${r2ToR1Subnet}`, logger); exec(`ip netns exec ${netnsr2} ip route add default via ${r1ToR2Subnet}`, logger); // Check that everything was setup correctly // Interfaces are up at the correct addresses exec(`ip netns exec ${netnsn1} ip addr`, logger); exec(`ip netns exec ${netnsn2} ip addr`, logger); exec(`ip netns exec ${netnsr1} ip addr`, logger); exec(`ip netns exec ${netnsr2} ip addr`, logger); // Routing tables are correct exec(`ip netns exec ${netnsn1} ip route`, logger); exec(`ip netns exec ${netnsn2} ip route`, logger); exec(`ip netns exec ${netnsr1} ip route`, logger); exec(`ip netns exec ${netnsr2} ip route`, logger); // Can ping from one node to the other exec(`ip netns exec ${netnsn1} ping -c 3 ${n2ToR2Subnet}`, logger); exec(`ip netns exec ${netnsn2} ping -c 3 ${n1ToR1Subnet}`, logger); // Delete the namespaces exec(`ip netns del ${netnsn1}`, logger); exec(`ip netns del ${netnsn2}`, logger); exec(`ip netns del ${netnsr1}`, logger); exec(`ip netns del ${netnsr2}`, logger); } main(); ```

emmacasolin commented 2 years ago

I've got the NAT rules working!! For testing this I created this setup of namespaces linked with my real system (since I can only open wireshark from my real system):

I wanted client to act like a client behind a router (router) and for my root system to act like a server. The only default routing that was required was on client, so that packets to any address (e.g. root) would be routed through router.

sudo ip netns exec client ip route add default via 10.2.2.2

I then added the following iptables rules to the router:

# Any packets leaving on veth1 coming from the client (10.2.2.1/24) should be made to look like they're coming from the router (10.1.1.1)
iptables -t nat -A POSTROUTING -s 10.2.2.1/24 -o veth1 -j SNAT --to-source 10.1.1.1
# Any packets arriving on veth1 addressed to the router (10.1.1.1/24) should be redirected to the client (10.2.2.1)
iptables -t nat -A PREROUTING -d 10.1.1.1/24 -i veth1 -j DNAT --to-destination 10.2.2.1

This simulates the endpoint-independent NAT mapping used by full-cone, restricted-cone, and port-restricted-cone NAT. Note that specifying the interface that the packet is arriving on for the PREROUTING rule is required (specifying the outgoing interface for the POSTROUTING rules isn't necessary but I added it in for symmetry). You can see why by looking at the PREROUTING table after adding these rules:

iptables -t nat -nvL PREROUTING          Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       all  --  veth1  *       0.0.0.0/0            10.1.1.0/24          to:10.2.2.1

Even though in our rule we specified the target address to be matched as 10.1.1.1/24, it gets stored as 10.1.1.0/24, meaning that packets being sent to 10.1.1.2 (root) will also be matched by this pattern, meaning that packets sent from the client to 10.1.1.2 will just be redirected back to itself. If we specify the incoming interface as veth1, then this rule will only match packets arriving from the router's out-facing interface and not those arriving from the side facing the client.

With all of this setup done, we can now send packets to and from the client and root. From the client's perspective it is communicating directly with root: it's sending packets addressed to 10.1.1.2 and receiving packets addressed from 10.1.1.2. From root's perspective, it never knows what the client's address is, or even that it's communicating with a client behind a router, since it receives packets addressed from 10.1.1.1 and sends packets addressed back to 10.1.1.1.

CMCDragonkai commented 2 years ago

Some relevant discussions in the PR https://github.com/MatrixAI/js-polykey/pull/357#issuecomment-1072090497 about MASQUERADE vs SNAT and difference between TCP and UDP as well as how symmetric NAT degrades to port-restricted when you only have 1 external IP.

Also for our test cases, I suggest this matrix can help:

It comes from https://dh2i.com/kbs/kbs-2961448-understanding-different-nat-types-and-hole-punching/

The non-routable ones should be routable with a TURN relay.

CMCDragonkai commented 2 years ago

@emmacasolin Can you change to address ranges instead as that should make it easier when we have more than 1 agent behind a NAT.

emmacasolin commented 2 years ago

With regards to this comment https://github.com/MatrixAI/js-polykey/pull/357#issuecomment-1073427986 we only need to simulate port-restricted cone and symmetric NAT for our tests. This is because our NAT busting will work for full cone and address-restricted cone NAT if it works for port-restricted cone and symmetric NAT, since the architectures they use are the same or less sophisticated than port-restricted cone/symmetric NAT.

CMCDragonkai commented 2 years ago

@emmacasolin

I've changed the issue name here to remove "non-Symmetric NAT" because we are infact testing with symmetric NAT now.

This issue is blocked on testnet deployment MatrixAI/Polykey#378.

CMCDragonkai commented 2 years ago

@emmacasolin can you tick off the tasks here if they are done.

CMCDragonkai commented 2 years ago

Earlier tasks are all ticked by the merging of MatrixAI/Polykey#381 to staging.

I've added task 7 to address the testing for testnet.polykey.io. It can only occur after integration:deployment.

Such a test would need to be conditional as well, but this time representing tests that run during integration.

@tegefaulkes is currently working on getting our tests/bin to work during integration:* jobs, so that work would be relevant because these would be the tests that should only run after integration:builds and integration:deployment finishes.

CMCDragonkai commented 2 years ago

@emmacasolin you'll start on this now, and since the testnet deployment will occur on each deployment to staging, that means you'll need to trigger testnet deployment locally whenever you're fixing up anything related to the testnet.

Please go through your AWS account, and test that you can interact with ECS and ECR. You'll need to use the ./scripts/deploy-image.sh and ./scripts/deploy-service.sh that is going to be merged in MatrixAI/Polykey#396.

Some initial related bugs include reviewing MatrixAI/Polykey#402. Also rename that PR to more specific to what is being solved there.

CMCDragonkai commented 2 years ago

Get yourself familarized with:

Those will be important to observe as you are redeploying the seed nodes.

CMCDragonkai commented 2 years ago

The last task is now a separate issue MatrixAI/Polykey-CLI#71, so this issue can be closed.

MatrixAI / Polykey