debauchee / barrier

Open-source KVM software
Other
27.57k stars 1.51k forks source link

WARNING: error in socket multiplexer: Unknown error causes Barrier server to hang #522

Open stiggy87 opened 4 years ago

stiggy87 commented 4 years ago

Operating Systems

Server: Win10 Ver 1803 (OS Build 17134.407)

Client: Win10 Ver 1803 (OS Build 17134.1130)

Barrier Version

2.3.2

Steps to reproduce bug

  1. Launch Barrier
  2. Configure to always be elevated
  3. Start server
  4. Wait for the crash

Running the server version at random time I get a ton of this message:

WARNING: error in socket multiplexer: Unknown error

Over 10 of this message is done a second and it causes the server to hang which forces me to stop the server for a few seconds and restart to make it work again.

Other info

I am currently running it with DEBUG2 to try and pinpoint to exactly when it happens what was the last thing to happen.

stiggy87 commented 4 years ago

Just ran into this and saw this before it hung the server:

NOTE: new client is unresponsive
DEBUG: Opening new socket: FC01DB20
NOTE: accepted client connection
DEBUG1: saying hello
DEBUG2: writef(Barrier%2i%2i)
WARNING: error in socket multiplexer: Unknown error

Noticed that the client went unresponsive and came back a lot (could be because of the network). But it looks like it was trying to open simultaneous sockets to the client.

ackstorm23 commented 4 years ago

I've been running into this too. The server works for about 60 seconds, then the server-side (windows 10) hangs and connectivity is lost.

Temporarily resolved until I manually kill it and restart barrier server on windows 10, but it only comes back again 60 seconds later.

IPWright83 commented 4 years ago

I'm seeing this regularly now too with a Windows 10 -> Ubuntu 18 setup. These errors are always on the Windows side (server) while the client reports the connection failed/timed out.

github-actions[bot] commented 4 years ago

Is this issue still an issue for you? Please do comment and let us know! Alternatively, you may close the issue yourself if it is no longer an problem

drysart commented 3 years ago

Commenting on this because this issue occurred with me today; and I isolated a scenario that could potentially assist in tracking down a cause. I have a network with two clients that (attempt to) connect to the Barrier server. I've had this setup for a while, and it was originally configured with the default settings of using HTTPS encryption, as part of diagnosing a performance issue some time back I disabled HTTPS encryption and just never got around to re-enabling it.

The server runs on Windows 10 2004.20277.1; and is configured not to use HTTPS encryption.

The first client is MacOS Big Sur 11.1. It is also configured not to use HTTPS encryption. This client was successfully connecting to the server -- until the server got stuck in a multiplexer error loop within a few minutes of being started and stopped accepting new connections.

The second client is a Windows 10 machine that, due to it not being part of the network for an extended period of time, was still configured to use HTTPS encryption. When this machine was re-added to the network it was never needed to return to the barrier group, and so the existing configuration was left even though it was non-functional because the need for it to be functioning wasn't there. The end result of this is that it was still configured to try to connect to the server; but due to the HTTPS setting mismatch, it failed to successfully connect. But it was in a retry loop and would attempt to reconnect several times a minute.

This misconfigured client's stream of failed connection attempts to the server is what was causing the server to fail. As soon as this machine was taken off the network again, the server was once again reliable with no other configuration changes. It would seem that there is a defect in the server's socket handling such that a failed connection and subsequently dead socket has a reasonably high chance of not getting cleaned out fully, thus getting the multiplexer stuck in an infinite error loop due that poisonous socket being in the set of sockets being polled and causing all polling to fail until the server is shut down.

Ideally the proper solution is to identify whatever defect is allowing those poisonous sockets from the failed connection attempts to remain in the socket collection; but short of that I'd imagine this could be worked-around by enhancing the error handling of the pollSocket call to iterate through all the individual sockets in the socket array to try to determine which one might be poisonous, and then dump that socket. That might be tricky to do, and so a secondary alternate solution could be to add circuit breaker logic to the multiplexer error itself, and if its frequency goes above a certain threshold have the server dump all of its connections.

p12tic commented 3 years ago

@drysart That's a great investigation, thanks a lot. This will make fixing this issue much easier.

drysart commented 3 years ago

I'll also add that on further investigation I determined it's not specifically due to the HTTPS/HTTP mismatch. I updated the second client so its HTTP setting matched the server, and then tried to see if simply having the server not have a configuration matching client's name was enough to keep the client out of the group. It showed the same behavior: the server rejected the client because of the name mismatch, and the client proceeded on a loop to retry the connection to the server. Eventually the server's multiplexer hung in the same error loop. The only way I found the avoid hanging the server was to not have the unwelcome client trying to connect at all.

mckernanin commented 3 years ago

I'm experiencing this as well, I also have a synergy license and I get the same error from Synergy (though the desktop app hard crashes lol).

Server: latest windows 10 Client: latest macOS

I have a ticket in to synergy support, I'll post the outcome here.