Open Crotalus opened 12 months ago
There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.
To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.
Do not merge this unless you want 5% of your player base to be unable to connect.
Do not merge this unless you want 5% of your player base to be unable to connect.
Well there's currently a big amount of playerbase who cannot (re)connect right now - even on relay connections - and this was identified as one of the causes and we are looking desperate for causes.
So I get the problem and wonder if there is alternative approaches not to spam to many sockets. From my understanding the amount of sockets per time comes from roughly
Amount of network adapters
(1 candidate set per ip address for HOST, SFRFLX, RELAY)
x (1 HOST
+ Amount of coturns
)
x Reconnect attempts per minute
So alternative ideas:
There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.
To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.
Do not merge this unless you want 5% of your player base to be unable to connect.
Do you still have a link to some kind of documentation of this bug so that we can look at it and see if there is a better mitigation today.
There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.
To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.
Do not merge this unless you want 5% of your player base to be unable to connect.
Hey @Geosearchef!
Good to see you around :)
Do you happen to have any idea why (during reconnects) there doesn't seem to be any Relay candidates offered?
Currently indeed it seems to reject srflx (and hosts?) candidate after 1 failed attempt, but from some relatively simple but reproducible reconnection testing, we find that it never comes with any RELAY candidates. Thus we end up with an infinite sequence of empty candidate lists after the first attempt. During initial lobby connection it does have RELAY candidates
I call it Failure mode B. Here's an example of that:
So currently I had the following questions we'd ideally understand to fix reconnecting:
Do you still have a link to some kind of documentation of this bug so that we can look at it and see if there is a better mitigation today.
At the end of ice development, we had issues on a few windows systems that we couldn't explain. The system was randomly closing ice's sockets. This occurred on around 3-10% of windows systems.
I somehow managed to reproduce it on my win7 machine. Investigating the issue with a debugger lead me through the Java stdlib down to its native code. Going through Openjdk code, and watching it using a debugger, the socket close wasn't issued in any part of the JVM but instead occurred after the WinAPI syscall in Windows Code. The status is literally just "Socket closed", no reason at all.
Temporarily, I could fix the issue by enabling! a VPN connection. Other people have sometimes seen better results by disabling unused network adapters.
By disabling host and srflx, I got a connection in almost all of those instances, which is why this workaround exists. It regards the relay as the safest, always working option.
What is the current issue? To me it sounds like the problem is with the relay, not the connectivity establishment.
@Brutus5000 One more idea: Instead of completely disabling the fallback mechanism if host succeeded once, how about you do both, first all, if that fails, relay only, if that fails, host/srflx only, if that fails, back to the start.
(@Ravandel95 I will take a look later.)
Seems like it gives up HOST_CANDIDATE after one failed connection attempt and then don't try that again until 2 minutes have passed (same with SERVER_REFLEXIVE_CANDIDATE after 2 attempts)
I wonder why that logic is needed, shouldn't ice4j have the chance to try all methods at the time of reconnection, regardless of earlier attempts? For instance when someone temporarily lost their internet access but got it back.