atheriel / longears

The RabbitMQ client for R
https://atheriel.github.io/longears/
36 stars 9 forks source link

No automatic reconnects #14

Closed stefanfritsch closed 3 years ago

stefanfritsch commented 3 years ago

Hi,

I always have to reconnect manually if the connection has been idle for more than a minute or so.

> amqp_publish(conn, message.raw, exchange = "run.function", routing_key = "#")
Error in amqp_publish(conn, message.raw, exchange = "run.function", routing_key = "#") : 
  Failed to publish message. Disconnected from server

I then have to reconnect and try again:

> amqp_reconnect(conn)
> amqp_publish(conn, message.raw, exchange = "run.function", routing_key = "#")
>

That's not a problem per se but ?amqp_reconnect says:

When possible, we automatically recover from connection errors, so manual reconnection is not usually necessary.

So before I write tryCatch() wrappers I wanted to ask if there's something I'm doing wrong. I followed the Basic Usage example just with a remote server. Is this related to the timeout= parameter in amqp_connect? Do I have to set some other parameter to enable automatic reconnects?

Thank you.

Best regards, Stefan

atheriel commented 3 years ago

If you have a publish-only workload you will always encounter this issue when you publish infrequently. The underlying reason is that we need to send heartbeats to the RabbitMQ server every 30s in order for the connection to remain active, and those heartbeats aren't sent unless you call publish() (or anything else that interacts with the connection).

As an aside: originally this package did not use hearbeats, but we discovered that this causes extremely brittle network connections that can crash/timeout R for periods of 15 minutes, so they are now enabled with no option to turn them off. It's the lesser of two evils.

The fundamental problem here is that R (and the underlying librabbitmq) is single-threaded, and so we can't really do stuff "in the background" like send heartbeats.

I have a few suggestions, based on our experience:

Unfortunately there is no way to indicate whether we got disconnected due to missed heartbeats, which is why the error message won't tell you. And we do try to recover from errors, but this is not actually an error -- it is intentional behaviour.