an-tao / trantor

a non-blocking I/O tcp network lib based on c++14/17
Other
392 stars 134 forks source link

TLS: FATAL Transport endpoint is not connected (errno=107) sockets::shutdownWrite - Socket.cc:110 #290

Closed Mis1eader-dev closed 7 months ago

Mis1eader-dev commented 1 year ago

Related to this drogon issue, after some debugging it appears to occur on TcpConnectionImpl.cc:427 and Socket.cc:110

Opening and closing connections multiple times within 20 second intervals is the cause.

Mis1eader-dev commented 1 year ago

Commenting TcpConnectionImpl.cc:416

                {
                    thisPtr->closeOnEmpty_ = true;
                    return;
                }
/*| 416 |*/     // thisPtr->tlsProviderPtr_->close();
            }
            if (thisPtr->tlsProviderPtr_ == nullptr &&

Helps eliminate this issue, but it may have side effects.

Mis1eader-dev commented 1 year ago

If we uncomment the above, narrow it down and comment OpenSSLProvider.cc:567 SSL_shutdown function call, it is the one responsible, the issue goes away.

    virtual void close() override
    {
        if (!SSL_is_init_finished(ssl_))
            return;
        // SSL_shutdown(ssl_);
        sendTLSData();
    }
marty1885 commented 1 year ago

Thanks for reporting it. I'm working on it.

marty1885 commented 1 year ago

@Mis1eader-dev May you make a debug build of and enable Trace level logs? I tried your patch to confirm it does make the symptoms go away. I still get disconnected from WSS.

Interestingly I get an cert error (WTF?). I want to confirm we are looking at the same bug.

[processHandshake] SSL handshake error: error:0A000412:SSL routines::sslv3 alert bad certificate - OpenSSLProvider.cc:706
marty1885 commented 1 year ago

Update: I cross check OpenSSL with Botan. They produce the same error. Upon further looking into, I find that FireFox generated the error MOZILLA_PKIX_ERROR_SELF_SIGNED_CERT and Chrome shows net::ERR_CERT_AUTHORITY_INVALID.

According to SO. You can't use WSS with self signed cert.

https://stackoverflow.com/questions/27725989/firefox-disconnects-websockets-connection-for-a-self-signed-certificate

Mis1eader-dev commented 1 year ago

On my side it logs

20230827 04:37:57.415974 UTC 944 TRACE [processHandshake] SSL handshake error: error:0A000416:SSL routines::sslv3 alert certificate unknown - OpenSSLProvider.cc:705

But the websocket is open and is fine. It is only when we refresh the page multiple times where the issue occurs.

This is the branch with debug and trace log enabled.

Mis1eader-dev commented 1 year ago

The steps to reproduce are as follows

  1. Run the server
  2. Open the browser and navigate to https://localhost:8443
  3. Allow the certificate by pressing Advanced and proceed
  4. Connection establishes successfully
  5. Refresh the page every 2 seconds
  6. The socket error will appear eventually
Mis1eader-dev commented 1 year ago

The complete log after few refreshes


20230827 04:56:40.027232 UTC 944 TRACE [recvData] Received 30 bytes from lower layer - OpenSSLProvider.cc:533
20230827 04:56:40.027308 UTC 944 TRACE [processApplicationData] Received 8 bytes from SSL - OpenSSLProvider.cc:739
20230827 04:56:40.027352 UTC 944 TRACE [parse] data encoded! - WebSocketConnectionImpl.cc:291

20230827 04:56:40.027437 UTC 944 FATAL Transport endpoint is not connected (errno=107) sockets::shutdownWrite - Socket.cc:110

20230827 04:56:40.027481 UTC 944 TRACE [handleError] [127.0.0.1:8443--127.0.0.1:46654] - SO_ERROR = 32 Broken pipe - TcpConnectionImpl.cc:374
20230827 04:56:40.027510 UTC 944 TRACE [handleClose] connection closed, fd=14 - TcpConnectionImpl.cc:349
20230827 04:56:40.027546 UTC 944 TRACE [onConnection] conn disconnected! - HttpServer.cc:129
20230827 04:56:40.027571 UTC 944 TRACE [onConnection] connect!!!100000 num=2 - HttpAppFrameworkImpl.cc:697
20230827 04:56:40.027606 UTC 944 TRACE [handleClose] to call close callback - TcpConnectionImpl.cc:359
20230827 04:56:40.027613 UTC 944 TRACE [connectionClosed] connectionClosed - TcpServer.cc:201
20230827 04:56:40.027659 UTC 944 TRACE [~Socket] Socket deconstructed:14 - Socket.cc:245
20230827 04:56:40.079295 UTC 944 TRACE [newConnection] new connection:fd=14 address=127.0.0.1:38778 - TcpServer.cc:62
20230827 04:56:40.079362 UTC 944 TRACE [TcpConnectionImpl] new connection:127.0.0.1:38778->127.0.0.1:8443 - TcpConnectionImpl.cc:60
20230827 04:56:40.079453 UTC 944 TRACE [operator()] connectEstablished - TcpConnectionImpl.cc:335
20230827 04:56:40.079511 UTC 944 TRACE [processHandshake] SSL handshake wants to read - OpenSSLProvider.cc:691
20230827 04:56:40.079554 UTC 944 TRACE [recvData] Received 525 bytes from lower layer - OpenSSLProvider.cc:533
20230827 04:56:40.080005 UTC 944 TRACE [processHandshake] SSL handshake wants to read - OpenSSLProvider.cc:691
20230827 04:56:40.081062 UTC 944 TRACE [recvData] Received 30 bytes from lower layer - OpenSSLProvider.cc:533
20230827 04:56:40.081158 UTC 944 TRACE [processHandshake] SSL handshake error: error:0A000416:SSL routines::sslv3 alert certificate unknown - OpenSSLProvider.cc:705
20230827 04:56:40.081221 UTC 944 TRACE [handleClose] connection closed, fd=14 - TcpConnectionImpl.cc:349
20230827 04:56:40.081273 UTC 944 TRACE [onConnection] conn disconnected! - HttpServer.cc:129
20230827 04:56:40.081304 UTC 944 TRACE [onConnection] connect!!!100000 num=1 - HttpAppFrameworkImpl.cc:697
20230827 04:56:40.081313 UTC 944 TRACE [handleClose] to call close callback - TcpConnectionImpl.cc:359
20230827 04:56:40.081319 UTC 944 TRACE [connectionClosed] connectionClosed - TcpServer.cc:201
20230827 04:56:40.081522 UTC 944 TRACE [~Socket] Socket deconstructed:14 - Socket.cc:245
20230827 04:56:40.094918 UTC 944 TRACE [newConnection] new connection:fd=14 address=127.0.0.1:38794 - TcpServer.cc:62
20230827 04:56:40.095043 UTC 944 TRACE [TcpConnectionImpl] new connection:127.0.0.1:38794->127.0.0.1:8443 - TcpConnectionImpl.cc:60
20230827 04:56:40.095331 UTC 944 TRACE [operator()] connectEstablished - TcpConnectionImpl.cc:335
20230827 04:56:40.095600 UTC 944 TRACE [processHandshake] SSL handshake wants to read - OpenSSLProvider.cc:691
20230827 04:56:40.095804 UTC 944 TRACE [recvData] Received 525 bytes from lower layer - OpenSSLProvider.cc:533
20230827 04:56:40.096980 UTC 944 TRACE [processHandshake] SSL handshake wants to read - OpenSSLProvider.cc:691
20230827 04:56:40.097904 UTC 944 TRACE [recvData] Received 64 bytes from lower layer - OpenSSLProvider.cc:533
20230827 04:56:40.098111 UTC 944 TRACE [processHandshake] SSL handshake finished - OpenSSLProvider.cc:601
20230827 04:56:40.098199 UTC 944 TRACE [onConnection] connect!!!100000 num=1 - HttpAppFrameworkImpl.cc:697
20230827 04:56:40.098256 UTC 944 TRACE [recvData] Received 517 bytes from lower layer - OpenSSLProvider.cc:533
20230827 04:56:40.098296 UTC 944 TRACE [processApplicationData] Received 495 bytes from SSL - OpenSSLProvider.cc:739
20230827 04:56:40.098370 UTC 944 TRACE [isWebSocket] new websocket request - HttpServer.cc:721
20230827 04:56:40.098471 UTC 944 TRACE [renderToBuffer] reponse(no body):HTTP/1.1 101 Switching Protocols
content-length: 0
content-type: text/html; charset=utf-8
server: drogon/1.8.6
sec-websocket-accept: eoK2ucCgkY4FK8KRSIaB9z7fNKo=
connection: Upgrade
upgrade: websocket
date: Sun, 27 Aug 2023 04:56:40 GMT

 - HttpResponseImpl.cc:708
Mis1eader-dev commented 1 year ago

Update, I tested the server on a Raspberry PI, connected to it with a computer and a phone, and I didn't encounter the error, but frequent disconnects were happening throughout both devices and multiple connections, it is as if ping pongs aren't taking place.

Edit: There are 3 tests to conclude this issue:

  1. Put the self-signed certificate in OS-specific certs whitelist and see if this issue persists.
  2. See other web frameworks, if they have the same issue then it may be universal.
  3. Comment SSL_shutdown to see if it solves Raspberry PI server disconnections, as the external server didn't have the shutdown error log symptoms. (FAILED)
Mis1eader-dev commented 7 months ago

I believe it is not worth the trouble to track this down, WebSockets work fine on regular servers with TLS.