0x4b53 / amqp-rpc

🐰 Framework to use RabbitMQ as RPC
MIT License
45 stars 8 forks source link

Add support to simpler restart server by using a restart channel #109

Closed bombsimon closed 5 months ago

bombsimon commented 5 months ago

This will add support to keep a restart channel on the server that can be triggered (send a struct{}{} or close) to restart the server. By doing this the user can trigger a graceful restart when needed without having to break out of the ListenAndServe loop.


Related to #107 and #108. This will introduce a new channel that can trigger server restarts without breaking out of ListenAndServe.

akarl commented 5 months ago

Awesome!

In the case of the Ack issues, there is usually multiple failures simultaneously, where an entire sequence of messages fails to Ack.

Technically the Acks should begin working again once the Delivery is delivered on a new Channel. Which only happens after the graceful restart has completed.

A potential improvement could be the introduction of a Restart() method. This method would function like the current Stop() method where it includes an atomic bool to prevent overlapping restarts.

EDIT: We would need to ensure that RabbitMQ actually re-delivers messages even if we gracefully shut down.

bombsimon commented 5 months ago

In the case of the Ack issues, there is usually multiple failures simultaneously, where an entire sequence of messages fails to Ack.

Technically the Acks should begin working again once the Delivery is delivered on a new Channel. Which only happens after the graceful restart has completed.

This PR doesn't really change anything in that regard, it would just replace calling Stop -> ListenAndServe. But are there any improvements you see that would be needed to make this useful? Right now it would work if there's no other issue but I guess this would only make sense if there's any potential problems and if so do we need to take more actions for this to make sense?

I'd imagine regarding Ack that unless the Ack ever succeeds it will be left at the broker and eventually be picked up again. I don't remember what will happen if we start processing the message but never Ack and then call Stop() - will we hang forever?

EDIT: We would need to ensure that RabbitMQ actually re-delivers messages even if we gracefully shut down.

Do you know from the top of your head what would be needed for this?

A potential improvement could be the introduction of a Restart() method. This method would function like the current Stop() method where it includes an atomic bool to prevent overlapping restarts.

Great idea! That also made me realize we might want a default channel that this could trigger if you want to restart the server without owning the channel. I added it!

bombsimon commented 5 months ago

Seems like I potentially can answer my own question. Reading here:

If a consumer dies (its channel is closed, connection is closed, or TCP connection is lost) without sending an ack, RabbitMQ will understand that a message wasn't processed fully and will re-queue it. If there are other consumers online at the same time, it will then quickly redeliver it to another consumer. That way you can be sure that no message is lost, even if the workers occasionally die.

There's obviously a lot to read in the docs about reliability but for this PR I think it's enough to just confirm that we're not losing any messages that could be saved easily. Even if we currently use defaults that causes this but could be fixed by a user through a public API I think any further changes is out of scope for this one. And finally if any such behaviour already exist in the shutdown process I also think improving those would be out of scope this this is mostly a way to be able to trigger a restart as a convenience without having a long loop as in the example in https://github.com/0x4b53/amqp-rpc/issues/107#issuecomment-2102401586.

codeclimate[bot] commented 5 months ago

Code Climate has analyzed commit 91a17681 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 86.9% (50% is the threshold).

This pull request will bring the total coverage in the repository to 79.1% (0.1% change).

View more on Code Climate.

akarl commented 5 months ago

This is a really nice change! Good job! I'm happy with how Restart() will gracefully restart the server, ensuring that all in-flight messages are completed. The code is clean and the new tests effectively cover the restart functionality.

LGTM!