dswd / vpncloud

Peer-to-peer VPN
https://vpncloud.ddswd.de
Other
1.8k stars 159 forks source link

Support for peers with multiple addresses #201

Open biolim opened 3 years ago

biolim commented 3 years ago

I would like to manage the priority of connecting to a peer at multiple addresses.

--config example.net:

`listen: 3210 peers:

or specify a priority source ip

listen:

jasmas commented 3 years ago

Can't this be accomplished by setting host routes with varying priority? You're talking about underlay addresses, correct?

dswd commented 3 years ago

Right now, VpnCloud just tries all addresses and does not prioritize or compare them. I have to check the code but it might be that when multiple addresses work, the slowest one wins out (as it is the last one to connect). Setting priorities manually is the final solution if nothing else works. I would rather add a logic that makes the first address that works win the race. Would this work in your scenario?

dumblob commented 3 years ago

@dswd how about periodical "reconfiguration" based on randomization of the list along with some latency/troughput measure over the last few seconds/minutes? That way it wouldn't require use interaction/prioritization and should actually converge to an above-average solution.

jasmas commented 3 years ago

I didn't realize you weren't connecting multiple tunnels to the same host. If you allow multiple tunnels and route the same vpn ip across them it would equal cost multi path route round Robin between them by default. A feature to periodically auto adjust the metric based on latency would be nice to have, but even without that all priorities should be able to be adjusted per host in the local routing table

biolim commented 3 years ago

It is empirically established that the address with the lowest highest octet is selected from several available ip addresses of one host. Automation should only provide switching in case of interruptions (already implemented). There is a need for manual control - which address to choose for priority connection.

biolim commented 3 years ago

I would rather add a logic that makes the first address that works win the race. Would this work in your scenario?

In this case, this is not a solution. The most important thing is not taken into account - the price of the channel. This is, ideologically, similar to EIGRP

Possible solution: an analog of the cost interface in OSPF

dumblob commented 3 years ago

I would rather add a logic that makes the first address that works win the race. Would this work in your scenario?

In this case, this is not a solution. The most important thing is not taken into account - the price of the channel. This is, ideologically, similar to EIGRP

Possible solution: an analog of the cost interface in OSPF

Yep. I'd emphasize that EIGRP (and OSPF) works only because it periodically frequently recomputes the price (thanks to CDP etc.).

biolim commented 3 years ago

This is just an analogy. VpnCloud sends keep-alive packets as often as you wish. And if the channel fails, it successfully switches to another available neighbor address. For example: If there were three available neighbor addresses, and the active one died, then the one with the smallest octet is selected from the two available addresses (determined empirically, perhaps this is an indirect effect).

dumblob commented 3 years ago

@biolim the current mechanism doesn't compute any price - it's binary "either or". But I'd prefer if it measured & estimated latency and bandwidth and also acted upon after reaching some threshold. So e.g. if some link became slower, another link should be tried. This would require monitoring several links in parallell, but IMHO it's worth it.

dswd commented 3 years ago

VpnCloud does not measure latency or bandwidth. Measuring latency would be relatively easy since there are already keepalive messages. Measuring bandwidth without interfering with existing traffic is pretty hard and I would rather not try to implement that.

I checked the code again: 1) I didn't find anything in the code that favors addresses with smaller octets. This is a very interesting effect. 2) When connecting to a peer, the node uses all known addresses in order but carries out the handshakes in parallel. 3) Apparently the slowest handshake wins at the end. 4) When a connection to a peer times out, the node tries to reconnect to only this one address, not all addresses. (If that fails, it will try to connect to all addresses when it receives them from a third peer).

So how can we improve this without introducing too much complexity? 1) I will change the code for 4) to always reconnect to all known addresses. 2) I can change the behavior when a handshake succeeds to a peer that has already a connection (i.e. when the slower of two handshakes finishes): Do not replace the existing connection if it is younger than 10 seconds. This way the fastest handshake will win instead of the slowest.

However I am hesitant to add prioritization to addresses since connections are not unidirectional. That means that both nodes use the same connection and therefore they need to negotiate the parameters of that connection. It is not clear what should happen if the nodes disagree about the priorities.

My proposed solution would prefer the fastest path (based on latency) for a connection without having to actively measure and compare latency.

jasmas commented 3 years ago

If multiple tunnels connect and the hosts each have different preferences as to which tunnel they send traffic then you would have asymmetric connections - traffic being sent down a different path than received. This is a problem for a firewall, but is easily resolved adjusting route weights. By default, assuming routes get installed for each path (tunnel in this case) the behavior actually should be for both hosts to load balance across the tunnels. Easiest to implement would be to connect multiple tunnels but keep path selection out of scope - leave it to the Routing table which should ECMP route between redundant paths by default and can always be tuned automatically with a script based on latency or manually based on preference.