exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
72 stars 39 forks source link

🐞WebSocket connection isn't properly closed in case of process termination #108

Closed Nicoretti closed 10 months ago

Nicoretti commented 10 months ago

Summary

The WebSocket connection isn't properly closed using the disconnect command at the Exasol WebSocket protocol level when a process is terminated.

Reproducing the Issue

  1. Run the code snippet bellow:
import pyexasol

class ConnectionWrapper:

    def __getattr__(self, item):
        return getattr(self._connection, item)

    def __init__(self, **kwargs):
        self._connection = pyexasol.connect(**kwargs)

    def __del__(self):
        if self._connection:
            self._connection.close()

if __name__ == '__main__':
    dsn = "localhost:8888"
    user = 'sys'
    password = 'exasol'

    con = ConnectionWrapper(dsn=dsn, user=user, password=password)
    res = con.execute("SELECT 42;")
    print(f"{res.fetchall()}")

Expected Behaviour

[(42,)]

Actual Behaviour

[(42,)]
Exception ignored in: <function ConnectionWrapper.__del__ at 0x7fa7c9799990>
Traceback (most recent call last):
  File ".../test_case.py", line 14, in __del__
  File ".../lib/python3.10/site-packages/pyexasol/connection.py", line 456, in close
  File ".../lib/python3.10/site-packages/pyexasol/connection.py", line 524, in req
  File ".../lib/python3.10/site-packages/websocket/_core.py", line 285, in send
  File ".../lib/python3.10/site-packages/websocket/_core.py", line 313, in send_frame
  File ".../lib/python3.10/site-packages/websocket/_core.py", line 527, in _send
  File ".../lib/python3.10/site-packages/websocket/_socket.py", line 172, in send
  File ".../lib/python3.10/site-packages/websocket/_socket.py", line 149, in _send
  File "/usr/lib/python3.10/ssl.py", line 1206, in send
ssl.SSLError: Underlying socket connection gone (_ssl.c:2326)

Root Cause (optional)

## Context * The issue only occurs in case of process termination * The issue has always existed but has been suppressed, see [here](https://github.com/exasol/pyexasol/blob/9ed5b201f8f672e1dad8349317782e5b4fcebf0d/pyexasol/connection.py#L896). * The impact of the issue should be minor. * The underlying `tcp-connection` is disconnected with a `TCP [RST]`, informing the connection server about the disconnect. * It should be handled mostly the same, with the exception that the connection may be resumed. ### Hunches * The problem likely is caused by libraries used by `pyexasol` rather than `pyexasol` itself * At some point, lower-level code calls `sock.detach()`. It's possible that the underlying socket is being cleaned up by lower-level facilities that can't consider the fact that the socket is still indirectly used or closed (but not yet freed). ### System **Desktop:** - OS: `Linux (Ubuntu 22.10)` - Python Version: `3.10.7` - pyexasol `0.25.2` #### Related Issues * https://github.com/exasol/sqlalchemy-exasol/issues/390
Nicoretti commented 10 months ago

@littleK0i I was wondering if you've encountered this before. If so, could you provide additional context regarding this?

littleK0i commented 10 months ago

This error is supposed to be extremely rare. If it reproduces consistently, it means that something was changed on Exasol server side. Over these years I had multiple iterations on "closing" problem.

Initially I was just calling .close() on WebSocket client. It worked, but it produced some error messages in SQLProcess logs, and it caused sessions to linger in internal Exasol system views for some time.

So I've changed it to sending disconnect command first, followed by .close() on WebSocket client. It helped older versions of Exasol to clean up sessions properly.

Since the application can be in undefined state when calling __del__, the .close() call must be wrapped in to try ... except ... pass, just in case if socket is no longer available.

But normal calls to .close() should be just fine, if both client and server function correctly.


I guess, now when we send disconnect command from client, Exasol server terminates WebSocket connection immediately. But it should probably send CLOSE frame to client or wait to receive CLOSE frame from client, as per WebSocket specification. Socket should not be terminated prior to sending / receiving CLOSE frame.

I suggest to double-check which frames are actually being sent and received. And double-check current Exasol version vs older versions.

littleK0i commented 10 months ago

In my view, the best approach would be to remove the necessity to send explicit disconnect command and simplify things overall.

Sending CLOSE frame from client should be equal to disconnect. Client timing out on ping / pong frames should be equal to disconnect.

If server wants to terminate a process, it should send CLOSE frame to client properly. If it is not possible, and server process was terminated by SIGKILL, the client might see a random error message like this, but it should be a very rare occurrence.

Nicoretti commented 10 months ago

@littleK0i, thank you for all the insights; they have been very helpful, and I also agree with many of the points you made. Regarding the potential change in the WebSocket protocol requirement, I'm not certain if I can directly impact or influence that modification, despite acknowledging its beauty. I plan to set aside some time at the end of the month to thoroughly examine the lower layers of the WebSocket client library and see if the issue can be identified and addressed there.

I appreciate your feedback, thanks.