Closed jeteon closed 7 years ago
Hi @jeteon - we are excited to hear that you are running CrateDB in production! However we are less excited about the issue(s) you are having.. can you maybe provide a short code sample where the driver blocks when a node is unresponsive?
Also, could you find out the reason why the node was unresponsive? Apart from the driver issue, there are maybe other things that could be done to avoid that problem in the future :)
Cheers, Claus
Hi @celaus. The server itself remains responsive but the application server will hang on that particular request. It times out eventually but by then our front-end server has timed out the request. There is a mitigation for it currently in place (basically, I test the connectivity separately) but it's not ideal. The below code is a single file example that demonstrates the issue:
<?php
require 'vendor/autoload.php';
use Crate\PDO\PDO as PDO;
$pdo = new PDO('crate:192.0.2.0:4200,127.0.0.1:4200', null, null, null);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$stmt = $pdo->prepare('SELECT * FROM a_schema.a_table');
$stmt->execute();
echo "Query executed.";
The code assumes you have a Crate instance running on localhost, port 4200. It basically goes in a file test.php
in a directory that you then run composer require crate/crate-pdo:~0.3.1
in. Executing php test.php
should demonstrate the issue.
On my system (running PHP 7) the connection hangs and then the script execution will end with a fatal error after over a minute:
PHP Fatal error: Uncaught GuzzleHttp\Ring\Exception\ConnectException: cURL error 7: Failed to connect to 192.0.2.0 port 4200: Connection timed out in /tmp/test/vendor/guzzlehttp/ringphp/src/Client/CurlFactory.php:126
My expectation would be that this would time out (in about 5 seconds going by the source) and then proceed to run the query on the next server in line.
I discovered that if I leave out the version in the composer
line then I get version 0.6.0 instead and the test runs as expected. I'm going to try to move the code base to that version. Is there any reason the documentation recommends version 0.3.1?
Confirmed things work properly on the current release 0.5.1 as well. Sorry about the hassle. Seems like a documentation thing more than a code thing.
If one of the first servers specified in the list of servers to try is on an unreachable IP address, then the PDO driver hangs on this first connection attempt and doesn't proceed to try the other servers as would be expected. To be fair, the connection does eventually time out after something like a minute but upstream timeouts have long given up on the request by that point.
This might seem far fetched but has happened to me recently on a production deployment where an interface on the server (needed to reach the IP space of the Crate server) failed. This hung the PHP application rather than moving onto other accessible Crate servers in the list as I would have expected. Part of the reason for using Crate was for this fail-over potential in these cases so this was a big deal to us.
I think it may be a case of a connection timeout being set to a very high number, not configurable via the API somewhere in the code base but I'm not sure where. I noticed you aren't setting the
connect_timeout
key anywhere in the code base and this defaults to "forever". However, changing this didn't seem to help in my case.