Open ghandmann opened 4 years ago
The RFC is not defining what the other side should do, when Pong-Fames are not coming back
Like you said, there isn't anything we're required to be doing in this case. The WebSocket will be closed if the TCP packet from a Ping or write isn't ACKed.
You can implement your own KeepAlive logic with app level messages like SignalR does.
We could consider adding a configurable timeout in the future.
We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.
As being resolved duplicate from #26117 I add a summary here: It is easy to reproduce, pull out network cable:-) ClientWebSocket does timeout (and throws WebSocketException) after not recieving Ping or Pong. Inconsistent. RFC does not say anything about this either, so why not let the client hang forever too:-D But seriously, RFC only says Pingpong can be used as a tool to implement heartbeat\timeout for endpoints (eg. both parties): "NOTE: A Ping frame may serve either as a keepalive or as a means to verify that the remote endpoint is still responsive." You may choose to not use the tools given, and it will work poorly. You may use them and it will work good. The RFC does not care, we do:-)
Here is exception from client side:
System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake.
---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
---> System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination)
at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)
at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure)
at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter)
at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter)
It does not seem related to Pingpong, possibly the client dies because of other circumstances. By coincidence the server does not die in the same way. I guess I come to same conclusion as original poster, that .Net WebSocket does not implement any of timeout related to Pingpong.
Another way to reproduce that I found was running the WS client in a VM, just doing stuff in a tight loop (I think I used VirtualBox, but others may work similar). Then diconnect the NIC for the VM (in the VM gui, not in Windows). Wait. Enable again (or don't). Client will crash after a while but server hangs forever.
I think that adding a WebSocketOptions.Timeout property would be a good thing. As @BrennanConroy says above, the WebSocket is only closed when the TCP Retransmission times out. This timeout seems to be different on Windows and Linux. By default, on Windows it takes about 21 seconds and on Linux it takes about 15 minutes before the WebSocket is closed.
The ClientWebSocket today only sends "Pong" to the server, and therefore it does not wait for a "Pong" response to arrive. I think that the ClientWebSocket shall send a "Ping" instead, and then verify that a "Pong" is received within the Timeout set. If not received, the connection shall be closed. Or at least it should expose the "received "Pong" message in the WebSocketReceiveResult.MessageType. It would be good if an received "Ping" message from a server is also exposed. Then the application could use these Ping/Pong message types to implement its own timeout logic.
I think that adding a WebSocketOptions.Timeout property would be a good thing. As @BrennanConroy says above, the WebSocket is only closed when the TCP Retransmission times out. This timeout seems to be different on Windows and Linux. By default, on Windows it takes about 21 seconds and on Linux it takes about 15 minutes before the WebSocket is closed.
The ClientWebSocket today only sends "Pong" to the server, and therefore it does not wait for a "Pong" response to arrive. I think that the ClientWebSocket shall send a "Ping" instead, and then verify that a "Pong" is received within the Timeout set. If not received, the connection shall be closed. Or at least it should expose the "received "Pong" message in the WebSocketReceiveResult.MessageType. It would be good if an received "Ping" message from a server is also exposed. Then the application could use these Ping/Pong message types to implement its own timeout logic.
Not sure if the ping\pong logic necessarily needs to change. The main thing is last time data was recieved, either a ping or real data. A successful response to sending data may also count as last recieved (and here, perhaps waiting for Pong to succeed may count as last recieved too, allthou not sure if this can be completely trusted). Edit: in any case, I think the current WebSocket code does make sure that either a ping or real data is sent to other end within the KeepAliveInterval, so I thinks its just code missing that keep track of lastRecieved in other end and terminating connction if lastRecieved > Timeout.
This timeout seems to be different on Windows and Linux. By default, on Windows it takes about 21 seconds and on Linux it takes about 15 minutes before the WebSocket is closed.
At least we are not alone in this then.. Having a custom setting for this would be great, 15 min is way too long as it gives false positives for us on what connections are active.
Always collecting again, never in action
It would really nice to have possibility to track the last client's pong. I don't understand why Microsoft engineers haven't still implemented it. To solve this issue peaple have to implemed some custom message types or edit MS code manually. Lol.
Describe the bug
The Websocket RFC defines in Section 5.5.2 and 5.5.3 the Ping and Pong frames. ASP.net Core 3.1 is sending a Ping-Frame from the Server to the client in the defined interval from
WebSocketOptions.KeepAliveInterval
property. So far everything works as expected.But with a "malicious" client, which is not responding with a mandatory Pong-Frame, the Server just keeps sending Ping-Frames and nothing happens.
After quickly scanning through the repo, i am afraid that there is simply nothing implemented, that checks if the client responds with a pong frame in time. To make this clear: The RFC is not defining what the other side should do, when Pong-Fames are not coming back. But it clearly states that the receiver of a Ping-Frame MUST send a Pong-Frame back. Based on that i was expecting some server-side timeout for the websocket connection (like missing three Pong-Frames terminates the connection).
This behavior currently prevents the server from detecting broken Websocket-Connections where e.G. an intermediate Router has gone away and now WebSocket Close-Frame and/or TCP FIN packet has been transmitted.
To Reproduce
This is not that easily reproducable, since you need a WebSocket-Client which you can "freeze" without sending a Close-Frame to the server. I used Mojo::UserAgent in a perl script to connect a websocket and after a while i paused the process via CTRL+Z on the command line.
I watched the network traffic via Wireshark and verified that Ping-Frames are sent and no Pong-Frames are going back anymore. The red line marks the moment when i froze the client-websocket.
I could provide a reproducer repo. But since to my knowledge there is just no handling in the server anyway, a reproducer would be "useless".
Further technical details
dotnet --info
Runtime Environment: OS Name: ubuntu OS Version: 18.04 OS Platform: Linux RID: ubuntu.18.04-x64 Base Path: /usr/share/dotnet/sdk/3.1.301/
Host (useful for support): Version: 3.1.5 Commit: 65cd789777
.NET Core SDKs installed: 2.1.807 [/usr/share/dotnet/sdk] 2.2.402 [/usr/share/dotnet/sdk] 3.1.301 [/usr/share/dotnet/sdk]
.NET Core runtimes installed: Microsoft.AspNetCore.All 2.1.19 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.19 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.1.5 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 2.1.19 [/usr/share/dotnet/shared/Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.5 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET Core runtimes or SDKs: https://aka.ms/dotnet-download