crate / crate-python

Python DB API client library for CrateDB, using HTTP.
https://cratedb.com/docs/python/
Apache License 2.0
79 stars 30 forks source link

Connection Timeout problems after 50 seconds #628

Closed amotl closed 11 months ago

amotl commented 1 year ago

Problem Report

We are using SQLAlchemy 2.0.20 and the CrateDB dialect 0.33.0, and run into timeout problems. Using connect_args={"timeout": 3} does not help. For automated testing purposes, if a setup can't connect, it should timeout as fast as possible.

Observations

The database client tries to connect for 50 seconds and then fails with multiple errors.

[...]
.venv\lib\site-packages\urllib3\util\connection.py:85: ConnectionRefusedError

During handling of the above exception, another exception occurred:
[...]
        except SocketError as e:
>           raise NewConnectionError(
                self, "Failed to establish a new connection: %s" % e
            )
E           urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x00000246CC1E0B20>: Failed to establish a new connection: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte

.venv\lib\site-packages\urllib3\connection.py:186: NewConnectionError

During handling of the above exception, another exception occurred:
[...]
>           raise ConnectionError(
                ("No more Servers available, "
                 "exception from last server: %s") % message)
E           sqlalchemy.exc.OperationalError: (crate.client.exceptions.ConnectionError) No more Servers available, exception from last server: HTTPConnectionPool(host='crate', port=4200): Max retries exceeded with url: /_sql?types=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001F4B50F0AF0>: Failed to establish a new connection: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte'))
E           [SQL: SELECT name FROM sys.nodes]
E           (Background on this error at: https://sqlalche.me/e/20/e3q8)

.venv\lib\site-packages\crate\client\http.py:601: OperationalError
amotl commented 1 year ago

Analysis

The observed error does not indicate a TCP timeout situation per se. Instead, it clearly reports a "connection refused" error. See crate/crate-python#631 for a more detailed description about it.

Using timeout settings

Using the "timeout" argument is the right choice to explicitly configure the connection timeout setting value in seconds.

DBAPI

In DBAPI, you would use it as a function argument 1:1.

time python -c 'from crate.client import connect; connect(["192.168.178.100:1234"], timeout=5)'

SQLAlchemy

In SQLAlchemy, you would wrap it into the connect_args option dictionary.

time python -c 'import sqlalchemy as sa; sa.create_engine("crate://192.168.178.101:1234", connect_args={"timeout": 5}).connect()'

In both cases, the connection attempt will be terminated within about five seconds.

real    0m5.169s

We confirmed it works on our machine, which is macOS Catalina in this case, connected to a vanilla Fritz!Box WiFi AP.

amotl commented 1 year ago

Further Analysis and Guessing

For automated testing purposes [...] The database client tries to connect for 50 seconds and then fails.

I believe what may be happening here is that a CrateDB instance is about to be provisioned on a virtual host machine for integration testing purposes, while the client is already attempting to connect to it. The VM manager / networking subsystem may be capturing the network package(s) until the guest machine is up, and then dispatch them to the machine's IP stack.

When this happens, and the guest machine's IP stack comes up, CrateDB has not been started yet, so the host peer will respond with a TCP RST package (WinError 10061), essentially terminating the connection, as outlined at crate/crate-python#631.

Outlook

After learning more about the environment where this problem occurs, and whether the scenario matches the hypothesis about the reason for the problem, we can discuss possible solutions.

amotl commented 1 year ago

Recommendations about building test harnesses for/with CrateDB and Python

Hi again,

For automated testing purposes [...]

on behalf of Python programs, we currently recommend to use the Testcontainers for Python implementation for CrateDB, together with pytest.

It has not been contributed to the upstream project yet, but it is easy to vendorize. https://github.com/crate/cratedb-examples/issues/72 has corresponding pointers and guidelines, demonstrating its usage on behalf of actual projects.

With kind regards, Andreas.

amotl commented 11 months ago

Hi again,

this report, together with crate/crate-python#631, was coming from a scenario where a user is building a test case, which connects to an invalid destination on purpose, in order to validate how the Python driver behaves in this situation.

The improvement coming from crate/crate-python#571 resolved the problem well:

I've implemented the timeout with the new version and it seems to work on both Linux and Windows. With a timeout of 0.1s, connecting to an invalid host takes about 2.5s to effectively time out, which is fine. When using an invalid port, the connection aborts immediately. Thanks for your effort!

With kind regards, Andreas.