Closed salvatorecordiano closed 5 years ago
This is the result of the changes made in #19. Previously to that, the consumer did not handle a connection shutdown initiated by the server but now it does.
When a RabbitMQ server dies, there are two cases I am aware of.
The server crashes unexpectedly. If this happens, all connections are dropped and the consumer stops with exit code 1
.
The server initiates a shutdown which includes to send notifications to all consumers. If consumer receives such a notification, it will finish processing the current job (if any) and then shuts down with exit code 0. The reasoning for this is, that a shutdown signal from the server can be viewed equally to a TERM signal from the shell.
From what I can tell from your (@salvatorecordiano) description, the second thing happened.
Your code worked so far, because the the rabbitmq-cli-consumer
binary did not have any execution path which resulted in an exit with code 0
.
It did not occur to me, that somebody is using the exit code to determine if a consumer needs to be restarted or not. In my model of operating the rabbitmq-cli-consumer
, a supervision process like systemd or Supervisor is used.
I am inclined to consider this as works as designed. If anybody has good arguments why this behaviour should be considered wrong, I will be open for a discussion.
As discover due to #42, there is currently a race condition in the way how asynchronous events get passed along. This resulted in the consumer exiting before event handlers had the change to do their business. As a matter of fact, the implementation intended to exit with code 10 when the server closes the connection.
I really feel sorry for having this messed up in my previous comment.
I will provide a fix.
@salvatorecordiano Do you mind to try out #47. Does this fix your issue?
Hi @corvus-ch, thanks. At the moment I'm not able to build the binary for Linux. Can you publish it, please? I will test your consumer immediately
@salvatorecordiano I have build and published a pre release version. See https://github.com/corvus-ch/rabbitmq-cli-consumer/releases/tag/2.3.1-alpha1.
@corvus-ch now it works properly. I wait your new release to deploy it. thank you
@salvatorecordiano I just released a new version: https://github.com/corvus-ch/rabbitmq-cli-consumer/releases/tag/2.3.1.
After your last release, I encountered a new issue. When I send to the consumer SIGTERM
the output is:
2018/11/07 09:31:51 Cancel consumption of messages.
2018/11/07 09:32:58 Processed!
It receives my signal but It doesn't take care of it, so the process keeps running forever.
In the previous release when we send to the consumer SIGTERM
, the process exit code is 0
.
When we send SIGKILL
, the exit code is 137
.
Hi @salvatorecordiano,
I tried to reproduce your issue and was not able to do so. Can you please open a new issue so we can investigate and track this new topic?
Hi @corvus-ch, I opened #51. In the issue description, I'm able to prove that your last release introduces this bug.
I'm using your consumer in production environment with success.
Yesterday night our RabbitMQ cluster went down, so most of our consumers died. Every consumer was automatically terminated with unexpected exit code 0, but this exit code is wrong, because the process was terminated without success.
We assume the right behaviour of the consumer process and we wrote the following chunk in a bash script:
In that way, we want to be sure that consumers are always up and running.
The previous trick was working with
ricbra/rabbitmq-cli-consumer
.To reproduce this bug you can follow this procedure:
In a new terminal window:
On the first terminal you will see the consumer stopped and you can check the exit code (
echo $?
).Can you help me? Thanks