TakahikoKawasaki / nv-websocket-client

High-quality WebSocket client implementation in Java.
Apache License 2.0
2.03k stars 292 forks source link

Errors when the server closes the connection (connection reset by peer) #192

Open joaopgrassi opened 5 years ago

joaopgrassi commented 5 years ago

Hi,

I'm opening this issue here because I feel I exhausted all my options. Maybe someone can direct/help in at least identifying the underlying problem.

We have mobile apps (Android and iOS) that do WebSocket operations with our server app. Our server is an ASP.NET WebAPI hosted on Azure Cloud Services. Everything works as expected and we are using the latest version (2.9) of this library.

In recent efforts, we are moving our server app to Azure App Services; In case you are not familiar with Azure stuff, App Service is their lead hosting platform these days for web apps. It's more modern than Cloud Services (which is basically a wrapper OS around a VM).

Anyway, the server code is still the same - We just moved it to a different hosting environment (Still IIS). We enabled the minimum TLS version of 1.2, as we had before, configured SSL bindings and all that. But during our tests, we realized that the WebSocket features stopped working.

As an initial debugging step, this is what we get in our Android App logs

10-01 16:06:57.409 I/AndroidWebSocketFactory(23896): Web socket error.
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896): com.neovisionaries.ws.client.WebSocketException: Flushing frames to the server failed: Write error: ssl=0x70344abc80: I/O error during system call, Connection reset by peer
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.doFlush(WritingThread.java:436)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.sendFrames(WritingThread.java:386)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.main(WritingThread.java:110)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.run(WritingThread.java:57)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896): Caused by: javax.net.ssl.SSLException: Write error: ssl=0x70344abc80: I/O error during system call, Connection reset by peer
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.android.org.conscrypt.NativeCrypto.SSL_write(Native Method)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.android.org.conscrypt.SslWrapper.write(SslWrapper.java:390)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.android.org.conscrypt.ConscryptFileDescriptorSocket$SSLOutputStream.write(ConscryptFileDescriptorSocket.java:618)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.flush(WritingThread.java:273)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.doFlush(WritingThread.java:424)
10-01 16:06:57.409 I/AndroidWebSocketFactory(23896):    ... 3 more
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896): Web socket error.
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896): com.neovisionaries.ws.client.WebSocketException: Flushing frames to the server failed: Write error: ssl=0x70344abc80: I/O error during system call, Broken pipe
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.doFlush(WritingThread.java:436)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.sendFrames(WritingThread.java:386)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.main(WritingThread.java:122)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.run(WritingThread.java:57)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896): Caused by: javax.net.ssl.SSLException: Write error: ssl=0x70344abc80: I/O error during system call, Broken pipe
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.android.org.conscrypt.NativeCrypto.SSL_write(Native Method)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.android.org.conscrypt.SslWrapper.write(SslWrapper.java:390)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.android.org.conscrypt.ConscryptFileDescriptorSocket$SSLOutputStream.write(ConscryptFileDescriptorSocket.java:618)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.flush(WritingThread.java:273)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    at com.neovisionaries.ws.client.WritingThread.doFlush(WritingThread.java:424)
10-01 16:06:57.416 I/AndroidWebSocketFactory(23896):    ... 3 more

Our app is quite complex, so the next I did was to isolate the issue by re-creating a very simple android app, connecting with a very simple server via WebSocket. The scenario is: First the client connects and sends a message. Then, the server replies and closes the connection. With this setup, I was able to reproduce the problem. The logs show this:

I/mylogger: New state CONNECTING
I/mylogger: New state OPEN
I/mylogger: [SERVER] Says: Polo
I/mylogger: New state CLOSING
I/mylogger: Received close frame.
    �
E/mylogger: An exception happened during the websocket operation
E/mylogger: On Frame error javax.net.ssl.SSLException: Write error: ssl=0x7039b79000: I/O error during system call, Connection reset by peer
E/mylogger: An exception happened during the websocket operation
E/mylogger: On Frame error javax.net.ssl.SSLException: Write error: ssl=0x7039b79000: I/O error during system call, Broken pipe
I/mylogger: New state CLOSED
I/mylogger: On Disconnect called.

As soon as I point my Android app to my local machine it works just fine:

I/mylogger: New state CONNECTING
I/mylogger: New state OPEN
I/mylogger: [SERVER] Says: Polo
I/mylogger: New state CLOSING
    Received close frame.
I/mylogger: �
I/mylogger: New state CLOSED
I/mylogger: On Disconnect called.

The next thing we did, was to try to use Fiddler in order to inspect the network. We configured the proxy on the Android phone to point to my local machine, with the port for the fiddler proxy as demonstrated here. Next, we changed the android code to use the proxy. Finally, we tested the app and all worked fine.

I also used two different clients, one Javascript, as here: Creating your own test, and one in c#. Both worked just fine.

I also recorded the network traffic on the server (.cap file) and opened with Wireshark, but I don't have much experience with it so I couldn't get much out of it. I can share the file in case anyone wants to help.

I see the system call and I understand that it must be some low-level TCP/TLS/SSL issue but I can't figure out what. 😕 It's even weirder because it works outside of Azure App Service.

Edit: You can check the sample app here: https://github.com/joaopgrassi/android-websocket-issue This app points to the server running on Azure.

burzek commented 4 years ago

Hello, i have same issue here, have you found, where was the problem? Thanks

joaopgrassi commented 4 years ago

Not really no. In our particular case the issue appeared after we switched our back-end from Azure Cloud Services to App Services. After several discussions with Azure support they couldn't also pin point the exact issue and said it was probably due to Firewall rules but they couldn't trace it to an actual issue. We did some hacks in the client code in order to deal with the disconnects.. =/