cloudamqp / amqproxy

An intelligent AMQP proxy, with connection and channel pooling/reusing
https://www.cloudamqp.com
MIT License
350 stars 54 forks source link

hung connection #135

Closed fernandomacho closed 1 year ago

fernandomacho commented 1 year ago

Hi I try to update to last version 0.8.13 from 0.8.12

After install some times receive this error

Oct 15 00:39:59 server-name amqproxy[3491300]: Upstream closed connection: CHANNEL_ERROR - expected 'channel.open' Oct 15 00:39:59 server-name amqproxy[3491300]: Upstream connection closed when returned Oct 15 00:46:01 server-name amqproxy[3491300]: Upstream closed connection: CHANNEL_ERROR - expected 'channel.open' Oct 15 00:46:01 server-name amqproxy[3491300]: Upstream connection closed when returned Oct 15 01:17:24 server-name amqproxy[3491300]: Unhandled exception in spawn(name: upstream read loop localhost:5672): AMQ::Protocol::Frame::Body-end was 123, expected 206 (AMQ::Protocol::Error::InvalidFrameEnd) Oct 15 01:17:25 server-name amqproxy[3491300]: from tmp/amqproxy/lib/amq-protocol/src/amq/protocol/frames.cr:43:13 in 'read_loop' Oct 15 01:17:25 server-name amqproxy[3491300]: from tmp/amqproxy/src/amqproxy/upstream.cr:34:7 in '->' Oct 15 01:17:25 server-name amqproxy[3491300]: from usr/share/crystal/src/fiber.cr:146:11 in 'run' Oct 15 01:17:25 server-name amqproxy[3491300]: from usr/share/crystal/src/fiber.cr:98:34 in '->' Oct 15 01:17:25 server-name amqproxy[3491300]: from ???

and in other times get

Oct 15 02:53:00 server-name amqproxy[3491300]: Receiving AMQ::Protocol::Frame::Basic::Deliver(@channel=1, @bytesize=37, @consumer_tag="consumer", @delivery_tag=882, @redelivered=false, @exchange="logger", @routing_key="unir-ia") but no client to delivery to Oct 15 02:53:00 server-name amqproxy[3491300]: Receiving AMQ::Protocol::Frame::Header(@channel=1, @bytesize=15, @class_id=60, @weight=0, @body_size=483, @properties=AMQ::Protocol::Properties(@timestamp_raw=nil, @content_type=nil, @content_encoding=nil, @headers=nil, @delivery_mode=2, @priority=nil, @correlation_id=nil, @reply_to=nil, @expiration=nil, @message_id=nil, @type=nil, @user_id=nil, @app_id=nil, @reserved1=nil)) but no client to delivery to Oct 15 02:53:00 server-name amqproxy[3491300]: Receiving AMQ::Protocol::Frame::Body(@channel=1, @bytesize=483, @body_size=483, @body=#<TCPSocket:fd 47>) but no client to delivery to Oct 15 03:15:33 server-name amqproxy[3491300]: Upstream closed connection: CHANNEL_ERROR - expected 'channel.open'

But most of the time the connection simply was hangs.

I am using php-amqplib version v3.5.4. Once the connection is opened, it remains "hanging" on any operation performed on the channel, for example exchange_declare.

I try debug amqproxy running:

./amqproxy --debug -l 135.125.87.151 -p 5674 amqp://localhost:5672 but not get any error only connect and disconnect (when it works)

The rabbit version I am using is 3.12.6

I have reverted to version 0.8.12 which works correctly.

alupher commented 1 year ago

We're seeing this issue too, it makes amqproxy unusable for us. We also had to downgrade to v0.8.12 to fix it.

dentarg commented 1 year ago

Thanks for reporting, we will take a look.

dentarg commented 1 year ago

@alupher @fernandomacho Feel free to share any code snippets that triggers the problem, as that would help making quicker progress on this

dentarg commented 1 year ago

Might be that we're missing a call to client_disconnected

https://github.com/cloudamqp/amqproxy/blob/ace87f82156d40a9ccfdd143a3f712606970f8e0/src/amqproxy/upstream.cr#L144-L156

in the ensure block

https://github.com/cloudamqp/amqproxy/blob/ace87f82156d40a9ccfdd143a3f712606970f8e0/src/amqproxy/upstream.cr#L86-L89

(or should we only disconnect clients when IO::Error / OpenSSL::SSL::Error happens?)

fernandomacho commented 1 year ago

Hi I not use SSL

For example this code:

$connection = new AMQPStreamConnection( $host, $port, $username, $password, $vhost, false, /*insists*/ 'AMQPLAIN', /*login method*/ null, /*login response*/ null, /*locale*/ 602.0, /*connection timeout*/ 602.0, /*read write timeout */ null, /* context */ true, /* keepalive */ 60, /* hearbeat */ 602.0, /*channel_rpc_timeout*/ null /* SSL */ ); $channel = $connection->channel(); $channel->basic_qos(null, $prefetch, null);

This problem actually appears when any operation is performed on the channel

In some cases it works correctly, but in most cases the connection hangs and you have to kill the process that opens the connection

dentarg commented 1 year ago

should we only disconnect clients when IO::Error / OpenSSL::SSL::Error happens?

Maybe? with v0.8.13 (https://github.com/cloudamqp/amqproxy/pull/128) we started to close the client socket also when this happened

https://github.com/cloudamqp/amqproxy/blob/ace87f82156d40a9ccfdd143a3f712606970f8e0/src/amqproxy/upstream.cr#L73-L80

dentarg commented 1 year ago

@fernandomacho (or anyone else) do you have more complete logs to share? You can share via https://www.cloudamqp.com/support.html if you don't want to share them here (mention this issue)

fernandomacho commented 1 year ago

Hi, no more log, I try run amqproxy with -d param and not log anything If you generate debug version I would try reproduce this error, with debug version

TristanPouliquen commented 1 year ago

Hi, following up on this issue as we also encountered issues where the update to 0.8.13 put our production environment down.

The symptoms were:

Sadly we do not have detailed logs around that as we had to act quickly to restore services.

dentarg commented 1 year ago

Okay, I'm able to reproduce with v0.8.13

include(__DIR__ . '/config.php');

use PhpAmqpLib\Connection\AMQPStreamConnection;
use PhpAmqpLib\Exchange\AMQPExchangeType;
use PhpAmqpLib\Message\AMQPMessage;

$exchange = 'router';
$queue = 'msgs';
$i = 0;

while ($i <= ATTEMPTS) {
    $connection = new AMQPStreamConnection(HOST, PORT, USER, PASS, VHOST);
    echo "Connection created" . PHP_EOL;
    $channel = $connection->channel();
    $channel->queue_declare($queue, false, true, false, false);
    $channel->exchange_declare($exchange, AMQPExchangeType::DIRECT, false, true, false);
    $channel->queue_bind($queue, $exchange);

    $msg = new AMQPMessage($i++, array('content_type' => 'text/plain'));
    $channel->basic_publish($msg, $exchange);

    echo "Message published: " . $msg->body . PHP_EOL;

    usleep(500000); // 0.5 seconds

    $channel->close();
    $connection->close();
    echo "Connection closed" . PHP_EOL;
}
integration-php-amqplib-amqproxy-1   | 2023-10-17 09:05:33 UTC: Proxy upstream: rabbitmq:5672
integration-php-amqplib-amqproxy-1   | 2023-10-17 09:05:33 UTC: Proxy listening on 0.0.0.0:5673
integration-php-amqplib-amqproxy-1   | 2023-10-17 09:05:34 UTC: Client connected: 172.26.0.5:60272
integration-php-amqplib-conns-1      | Connection created
integration-php-amqplib-rabbitmq-1   | 2023-10-17 09:05:34.006434+00:00 [info] <0.703.0> accepting AMQP connection <0.703.0> (172.26.0.4:33100 -> 172.26.0.2:5672)
integration-php-amqplib-rabbitmq-1   | 2023-10-17 09:05:34.008802+00:00 [info] <0.703.0> connection <0.703.0> (172.26.0.4:33100 -> 172.26.0.2:5672) has a client-provided name: AMQProxy 0.8.13
integration-php-amqplib-rabbitmq-1   | 2023-10-17 09:05:34.008802+00:00 [info] <0.703.0>
integration-php-amqplib-rabbitmq-1   | 2023-10-17 09:05:34.009979+00:00 [info] <0.703.0> connection <0.703.0> (172.26.0.4:33100 -> 172.26.0.2:5672 - AMQProxy 0.8.13
integration-php-amqplib-rabbitmq-1   | 2023-10-17 09:05:34.009979+00:00 [info] <0.703.0> ): user 'guest' authenticated and granted access to vhost '/'
integration-php-amqplib-conns-1      | Message published: 0
integration-php-amqplib-amqproxy-1   | 2023-10-17 09:05:34 UTC: Client disconnected: 172.26.0.5:60272
integration-php-amqplib-conns-1      | Connection closed
integration-php-amqplib-amqproxy-1   | 2023-10-17 09:05:34 UTC: Client connected: 172.26.0.5:60282
integration-php-amqplib-conns-1      | Connection created
<hung here>

with amproxy v0.8.12 the code above continues to publish messages

akhomyakov-marfatech commented 1 year ago

I also faced this problem. Fortunately, it occurred when setting up new application nodes. Spent 40 minutes debugging. If we had updated on all nodes at once, we would have stopped production.

Anyway, guys, thank you for this app. I really like amqproxy. It has surprisingly improved the performance of the application greatly.

dentarg commented 1 year ago

Sorry about that @akhomyakov-marfatech

v0.8.14 has been released addressing this issue, please give it a try and let us know how it goes