eandersson / amqpstorm

Thread-safe Python RabbitMQ Client & Management library
https://www.amqpstorm.io/
MIT License
186 stars 36 forks source link

Heartbeat vs. hearbeat timeout confussion #127

Closed vanaoff closed 9 months ago

vanaoff commented 10 months ago

Hello there :wave:

I'm a bit confused of heartbeat parameter. It seems it's value is being used on two places - amqpstorm.channel0.Channel0._send_tune_ok and amqpstorm.heartbeat.Heartbeat._start_new_timer. As far as I understand, the first usage negotiates heartbeat_timeout with the server and the second usage sets the interval to send heartbeat requests.

I would expect the interval should be shorter than timeout, eg. heartbeat_timeout / 2 as described eg. in RabbitMQ documentation here but the amqpstorm uses the same value. Please, what I'm missing here?

Thanks for the answer and all the work on this library :pray:

Ivan

eandersson commented 10 months ago

I don't remember the exact details, but you are essentially setting the interval meaning how often to send heartbeats, not the actual timeout and the code will check if you missed the heartbeat X number of times.

e.g. if you look at the client side code for amqpstorm it checks if you missed two or more heartbeats (meaning interval * 2) before it raises an issue. https://github.com/eandersson/amqpstorm/blob/2.x/amqpstorm/heartbeat.py#L88

Also, worth nothing that on the client (and maybe server) we count any type of incoming data as a heartbeat. It does not necessarily need to be an actual heartbeat frame.

vanaoff commented 10 months ago

Thank you for the answer :+1: I understand the client part. What confuses me is server side of heartbeat. It's not very clear to me what server does in case it misses heartbeat.

What we experience is that some "long-time" (=longer than heartbeat) processing messages get requeued before acknowledgement after time period suspiciously similar to heartbeat.

So here's my suspicion. As described here, there could be mis-alignment between heartbeat_interval and heartbeat_timeout, where heartbeat_interval should be equal to heartbeat_timeout / 2. According to RabbitMQ documentation, it seems RabbitMQ (and official libraries) server exposes heartbeat_timeout, which I think is configured via negotiation in amqpstorm.channel0.Channel0._send_tune_ok... However, the ampqstorm sets the same value to what I think is more like heartbeat_interval. So, when the value is set to eg. 60s, ampqstorm sends hearbeat every 60s and it's correctly handling one miss which makes the actual heartbeat_timeout 120s. On the other hand server closes the connection after 60s of inactivity and requeues the message.

I can try to create reproducible example to prove/disprove this hypothesis...

eandersson commented 10 months ago

I see what you mean. I think the terminology is confusing here. I prepared a PR. If you are able can you review it and try it out?

vanaoff commented 10 months ago

Agree, it's confusing. Thanks :pray: I'll get to test probably sometime next week. Btw, aren't you worried about backward compatibility?

eandersson commented 10 months ago

Agree, it's confusing. Thanks 🙏 I'll get to test probably sometime next week. Btw, aren't you worried about backward compatibility?

So there shouldn't be any changes that aren't backwards compatible here since we are keeping the variable names the same. The only risk is that there will be a very minor increase in CPU and bandwidth usage since the background thread is run twice as often now with the same heartbeat value, but we are looking at a very minor increase, since the server already enforced the same current heartbeat value and behaviour.

vanaoff commented 9 months ago

Hello Eric, please, are you planning to release this fix anytime soon?

eandersson commented 9 months ago

Hello Eric, please, are you planning to release this fix anytime soon?

Thanks. It should be live now.

vanaoff commented 9 months ago

Thanks once again 👍