Closed CMCDragonkai closed 1 year ago
We can get the status of the seed node. I think the status should show the current version of Polykey. We can take that from the package.json right?
Indecently the client commands doesn't support the client hostname?
Can you create task to support hostnames for client commands. Specifically when we "connect to" we should support hostnames, but if we are "listening", then they should not be allowed.
The status can show the current PK version. This is already available as part of the config.sourceVersion
. Add as new task.
Starting the agent with the seed node has a problem. It's failing to fully connect to the seed node. We can see the connection in the seed node's logs.
Also note, failing to connect to the seed node here is causing a crash. The seed node still works.
Seed node connection appears to be working now. So we are proceeding to consolidate our manual tests into the automated tests in PR MatrixAI/Polykey#441.
I'm getting the same problem as before where the seed node is failing to connect back to me. I've narrowed it down to the VM i'm using since running on the laptop works.
Is your VM using natted network or bridge?
The default switch is using a nat. I also configured it to share the hosts physical adapter and that had the same problem. It might just be a quirk with VMs or the windows hypervisor.
I can test it with the VM i set up on my NAS. AFAIK it has it's own IP address so it's a separate machine so far as my home router cares.
Don't bother right now. If it works on the laptop from your home, let's proceed to office to home and double node to testnet.
From the office with 2 nodes, we are seeing the signalling operation complete. However upon attempting to hole punch each other (with the same IP address but different ports), while we see these packets get sent out, we are not receiving any of these packets.
This is true even after opening the firewall on the local systems. One would have to assume that the CGNAT in the office is blocking these packets.
At home however, it's not behind a CGNAT, so now we are switching to testing with 2 nodes on from home. Also we used tailscale to connect to the home system, which shows up with a direct connection with tailscale status
, this confirms that hole punching can work between office and home, which should imply that home NAT is a normal NAT, and not a CGNAT.
Some additional clarification on CGNAT is required:
Basically hairpinning is technically a solution a problem of internal communication within the same CGNAT. But the reason why CGNAT will not allow hole punching is due to the NAT translation.
The IP and port observed by the seed node is not the same IP and port that is within the CGNAT. If the CGNAT can rewrite the IP and port to be as if it came from the outside IP/port, then it would work, but this is not what the CGNATs with or without hair pinning.
This means we expect that this test to not work on our CGNAT situation, and we can only do this with relaying.
@tegefaulkes I think at this point you just have to merge in the fixes to the signalling mechanism.
And we will need to test on the home network.
I noticed on tailscale, if we try to connect to one of the IPs, it recognises that the the system is on the same local network, and instead of going through the larger network, it just uses the direct IP:
100.117.43.4 matrix-win-1 tagged-devices windows active; direct 192.168.1.100:41641, tx 140520 rx 776152
Where 192.168.1.100
is the local IP.
This means it's not even doing hole punching or anything, it's a direct connection.
This is pretty smart. I wonder if it's possible for the seed node to also acquire this information and send this back so that way it's possible for us to know that a local IP should be used instead.
Ok home to office connection succeeded. That's NAT to CGNAT.
It's possible even CGNAT to CGNAT could work, but we don't know this until we have 2 CGNATs.
It turns out that it doesn't work if we are in the same network. This is because lots of routers lack hairpinning support including both the office router and the home router.
Without hairpinning, one cannot do hole punching between hosts on the same subnet. This is why our double node to seed node test didn't work. It didn't work in the office, and it didn't work at home either.
This comment https://news.ycombinator.com/item?id=8229792 illustrates some reasons why. Furthermore when comparing to tailscale, they switch to using the local IPs instead of the remote IPs without actually doing any hole punching. Both entail requiring additional logic on the node graph/node connection manager that can take local address data and prioritise it over remote address data. Basically MatrixAI/js-mdns#1 is needed.
In the comment there appears to be a couple solutions:
At 3., we have this problem where network information can be leaked. If we are expecting decentralised NAT MatrixAI/Polykey#365, then this basically provides internal network information to random third parties which is not a good situation.
For now, we will prefer solution 2. So if at any point we end up trying to contact a particular node, and it has a local address, it should avoid sending a signalling message at all... I'm not sure if that's possible to know whether a particular node address is local or not. We may tag these addresses as "local" if they were acquired through a LAN discovery process.
Still need the post on the logs and full connection after you're ready @tegefaulkes.
It's important to see that both seed nodes are capable of also discovering each other, and as well as new nodes when they are entered into the cluster.
This can be closed now. We know a couple of things:
A final test is required involving NAT to CGNAT and the 2 seed nodes together. In total 4 nodes should be tested. However with the amount of failures we're going to blocked on this until we really simplify our networking and RPC code.
More sophisticated NAT simulation testing will go to Polykey-Simulation
repo.
Signalling Triad:
Tasks
Setup
testnet.polykey.io
as the the default testnet seed node configuration in source code. Put it in./src/config.ts
.Client Service
agent status
command. Provide it to the--client-port
of 1315 and--client-host
of the static IP address.Office Node to Seed Node
The office is a carrier-grade NATted network. Therefore it has to contact the seed node and maintain a connection.
npm run polykey -- agent start
. Observe a successful command execution.tests/integration
in MatrixAI/Polykey#441 to suit.Home Node to Seed Node
Home is a regular NAtted network. Therefore it has to contact the seed node and maintain a connection.
Note that automated tests which test connection startup to seed nodes is only one way. There's no way to simulate 2 nodes on the tests atm. So we will only do the required tests as above for
tests/testnet/testnetConnection.test.ts
.Office Node to Home Node
CGNAT to CGNAT - Office (CGNAT) to Home (CGNAT)
This would resolve MatrixAI/Polykey#383. This would be necessary for mobile networks.
This is low priority atm, so we will put this to MatrixAI/Polykey#383.
Multiple Office/Home Nodes to Seed Node
We should be able to start 2 nodes on different ports on the same machine to connect to the same testnet. This should be integrated into MatrixAI/Polykey#441.
Office Node to Seed Node
for 2 nodes on different ports.It turns out this is actually more complex. It's not possible to do this without the router performing hairpinning. And most routers don't have hairpinning. See: https://github.com/tailscale/tailscale/issues/188 and https://github.com/MatrixAI/Polykey/issues/487#issuecomment-1294742114