Ja7ad / amqp

Wrapped amqp091-go with specific stable/safe regarding connection (Production Ready)
MIT License
7 stars 1 forks source link

close channel manager on re connection in publisher #1

Open Ja7ad opened 10 months ago

Ja7ad commented 10 months ago

I am experiencing an unknown issue with the close channel during the reconnection of the publisher.

message 0 publised
message 1 publised
message 2 publised
message 3 publised
[2023/11/28 09:51:14] WARN attempting to reconnect to amqp server after close with error: Exception (501) Reason: "read tcp 192.168.0.10:54974->65.109.234.125:5672: i/o timeout" {"source":{"file":"/home/javad/Projects/go/personal/amqp/channel.go","function":"github.com/Ja7ad/amqp.(*channel).startNotifyCancelOrClosed","line":63}}
[2023/11/28 09:51:14] ERROR attempting to reconnect to amqp server after connection close with error: Exception (501) Reason: "read tcp 192.168.0.10:54974->65.109.234.125:5672: i/o timeout" {"source":{"file":"/home/javad/Projects/go/personal/amqp/connection.go","function":"github.com/Ja7ad/amqp.(*connection).startNotifyClose","line":51}}
[2023/11/28 09:51:15] WARN waiting 5s seconds to attempt to reconnect to amqp server {"source":{"file":"/home/javad/Projects/go/personal/amqp/channel.go","function":"github.com/Ja7ad/amqp.(*channel).reconnectLoop","line":93}}
2023/11/28 09:51:24 Exception (504) Reason: "channel/connection is not open"

The reconnection channel manager does not have issues with the consumer and continues to run in the background to return to a normal state, but it encounters issues with the publisher during the close on reconnection.

farmani commented 10 months ago

I struggled with same issue in other programming languages specifically because RabbitMQ try to close dead TCP connection. IIRC tcp timeout or keepalive is set to 30 sec by default in most cases so you must have a heartbeat mechanism to keep connection alive.

I didn't find a time to read your code but I assume

  1. you are using round robin to choose a live connection from your pool
  2. you have no delay in your consumer connections but in publisher you have a delay and some connections in your connection pool remain idle for more than 30 sec and timeout will happen.

you can try to reduce sleep delay and/or reduce number of live connection in your pool to less than 5 then all your connection will be use in roundrobin algorithm in less than 30 sec.

I hope this explanation can help you to find the reason.

https://www.rabbitmq.com/heartbeats.html