all traffic blocked when qpep client is unable to establish a session with server

Project-Faster / qpep

A working version of qpep standalone go client/server, designed to accelerate high-latency connections, like those provided by geostationary satellites.

https://docs.projectfaster.org

Other

3 stars 1 forks source link

all traffic blocked when qpep client is unable to establish a session with server #9

Closed bizzbyster closed 1 year ago

bizzbyster commented 2 years ago

Can we change the logic so that redirection is only enabled if there exists a healthy session with the server?

parvit commented 2 years ago

Sure we could echo the server before starting the rediction.

However we'll need to prioritize the features first before i begin working on this.

parvit commented 1 year ago

Hi @bizzbyster, I just pushed a change to the branch https://github.com/Project-Faster/qpep/tree/issue-9.

I've elected in the end to implement an auto-termination of the client instead of the ping i mentioned before.

When the redirected connection is lost with the server, after 30 failed retries the client is terminated, but if the connection is established again before the attempts number are reset.

That is because the status client communication code path revolves around the redirection and without it you would introduce a requirement to open also the gateway api port to the outside (or you would have to do an awkward thing with the quic connections emulating the redirection) and i don't think it would be justified in this case.

I do not open a PR at this time, we will open a testing branch in the next week.

bizzbyster commented 1 year ago

30 failed retries? I wonder how long that will mean the user is without any Internet in the worst case?

parvit commented 1 year ago

it would be total 30 seconds (status check is done every 1 sec.), we can lower it if needed

bizzbyster commented 1 year ago

I wonder if we should use a regular TCP connection to the same port to indicate session health. @mfoxworthy what do you think?

parvit commented 1 year ago

I can think of three reasons why that would not be a good solution:

It would be counterintuitive to have two servers TPC and UDP on the same port
Very easy to forget the firewall setup for this special configuration (or it might be not possible altogether) and lead to issues where everything seems to not work even if it should
Would need additional code only to make the special TCP connection without the redirection and when to use it

mfoxworthy commented 1 year ago

I'm trying to figure out what would be gained by using TCP. What we are really trying to do is ensure that when the server is unavailable, the client will not send traffic over the tunnel. TCP is just a transport, right? What, in a TCP session, will indicate that the server is not available? Are you thinking the TCP Session Timeout? That too is configurable. So maybe we should think about reducing the retry count to something much less. We should also be trying in the background too. For example, sidestep the tunnel but at the same time continue trying and when successful move new sessions back to the tunnel.

bizzbyster commented 1 year ago

TCP is implemented in the kernel on both sides so even when an application crashes a FIN or RST packet goes out and informs the other side that it's down. Won't that behavior be hard to replicate in QUIC/UDP only?

parvit commented 1 year ago

@mfoxworthy Initially i thought about enabling and disabling the tunnel dynamically, but after seeing in the code the special cases i would have to put it got to me that a simple retry count did achive the same result without too much additional burden on configuration or the code.

Maybe lowering the count to 10 / 15 would be enough?

mfoxworthy commented 1 year ago

@bizzbyster I agree in the case that the application dies. But let's take for example the network goes down, a load balancer fails, VPN concentrator fails, or even the server crashes before TCP sessions can be gracefully torn down. Even VPN clients use keepalives. Routers use BFD. So, I think a keepalive is still the best way to do this. But we should be able to first, configure it and second, to secure it. Keepalives can be a source of DDOS attacks.

parvit commented 1 year ago

The status api from the client is asked in any case every second when the connection is alive so i don't think a separate keepalive is necessary.

However if i understand correctly, do you think its better to either: a. Set the (3rd instance) api router listening directly as TCP server on port 443 b. Set said 3rd instance on a different TCP port and require a firewall rule to allow it through

Which option would be preferable?

mfoxworthy commented 1 year ago

@parvit I was actually referring to your status check as a keepalive. I think it's good to use your method but we should have a configuration for it. That configuration should have a way to prevent flooding.

bizzbyster commented 1 year ago

That's what I understood -- I'm fine with this.

parvit commented 1 year ago

Ok then i misunderstood your comments, i'll add the necessary configuration parameters.

bizzbyster commented 1 year ago

Sounds good. We should definitely not over-design this now. Thanks guys!

parvit commented 1 year ago

Done, now we wait for the testing phase to begin (i think as soon as i have made progress on #17 we can begin).

parvit commented 1 year ago

testing on 8/10 confirmed this part working.