Open ignaciogoldchluk-yolo opened 1 month ago
Coney stops processing new RabbitMQ messages upon receiving a shutdown message from the BEAM.
I agree with this approach, it looks like a correct approach đź‘Ť
Existing RabbitMQ messages are processed and acked. Once all RabbitMQ messages have been processed, Coney shuts down.
This would be an ideal and valid solution, however I lean more towards the alternative approach and here is why:
I believe the correct shutdown process is must have and handling idempotency issues should be the responsibility of the application, because it can handle such issues better.
Also, it could be useful to support passing additional options to the consume function, like no_ack
. This should give applications more flexibility in message processing.
What do you think?
You are correct. We ended up going with the alternative approach. The main issue was that ApplicationSupervisor
started ConsumerSupervisor
first with all the ConsumerServer
, and then ConnectionServer
. Since the processes are closed in reverse order, it was first closing ConnectionServer
, which meant ConsumerServer
were still processing messages and had ACK to deliver but no connection/channel to send it through, raising errors when closing an application.
It will be up to the application to handle redeliveries when using Coney
Current behaviour
ConnectionServer.terminate/2
immediately closes the connection.Expected behaviour
This is one of the possible solutions:
Another solution could be to leave it up to the application code (workers) to deal with redelivered messages. Increasing the heartbeat timeout is not a solution since in rolling deployment styles the connection might be reopened from another host.
Technical information
ConnectionServer
should start, and thenConsumerSupervisor
. Since they are terminated in reverse order, firstConsumerSupervisor
terminates allConsumerServer
, and finallyConnectionServer
can close the connection.ConsumerSupervisor
does not have to be aDynamicSupervisor
since we already know ahead of time the full list of consumers. It could be a regularSupervisor
.ConnectionServer
can keep a map of{consumer, channel}
so thatConsumerServer
does not keep any connection (channel) state. That way whenConnectionServer
receives a{:DOWN, _, _, _}
message it only has to update its{consumer, channel}
map list, and all theConsumerServer
are unaffected.ConsumerServer
can be responsible only for processing the messages and creating the response, andConnectionServer
communicates with RabbitMQ.