crate / crate-pdo

CrateDB PHP PDO adapter
https://crate.io/docs/pdo/
Apache License 2.0
33 stars 19 forks source link

High number of database connections #138

Open amotl opened 1 year ago

amotl commented 1 year ago

Hi there,

users of this library reported about a high number of database connections to CrateDB. They are using it in a high-volume data ingest environment.

There seems to be a ATTR_PERSISTENT attribute [^1], which, under normal circumstances, would keep a connection open. But in the CrateDB PHP PDO adapter, it dumps that option if it finds it configured.

What we want ideally is connection pooling, that is, for connections to be reused to save all the overhead of creating new connections for each and every database conversation.

With kind regards, Andreas.

[^1]: PHP PDO » Connections and Connection management

/cc @hlcianfagna, @hammerhead

amotl commented 1 year ago

Hi again,

while the best solution for limiting database connections is by using PgBouncer or SQL Relay, this would be a concern separately from this driver, as this will mean to use CrateDB's PostgreSQL interface instead. It is a viable option, and we should explore that route.

With respect to the current implementation of the CrateDB PHP PDO driver, which uses CrateDB's HTTP interface through the Guzzle HTTP client library, we should at least provide a way to limit concurrency by following the corresponding Guzzle » Concurrent requests guideline, i.e. using GuzzleHttp\Pool instead of GuzzleHttp\Client.

With kind regards, Andreas.

amotl commented 1 year ago

I was wrong on my assumption GuzzleHttp\Pool would be some kind of solution here. Despite it's called Pool, it is apparently about batching multiple requests, where the number of requests is known in advance.

Obviously, this can't be applied in situations when an unknown amount of requests from any clients has to be processed.

amotl commented 1 year ago

I am now following up on the HTTP connection keep-alive / connection re-use aspects, with respect to the current HTTP client implementation based on Guzzle and the PHP cURL binding, if I am correct.

-- https://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-php-curl-library -- https://curl.se/docs/faq.html#Can_I_perform_multiple_requests

I've identified the following settings from ^1 might be interesting candidates to explore. However, it's a different thing if they can actually be used through Guzzle.

amotl commented 1 year ago

Quickly searching my vendor/guzzlehttp directory, it looks like none of the different CURLOPT settings listed above is used by Guzzle's CurlFactory at all.

amotl commented 1 year ago

GH-53 adjusted the default timeout to 0, which apparently means "indefinitely", mostly. After reading the discussion(s) referenced below, I wonder if it was a good decision? It's not completely clear to me yet.