corvus-ch / rabbitmq-cli-consumer

Consume RabbitMQ messages into any cli program
MIT License
236 stars 36 forks source link

Graceful shutdown doesn't work properly #51

Closed salvatorecordiano closed 5 years ago

salvatorecordiano commented 5 years ago

After solving #40, when we send to the consumer SIGTERM, it doesn't handle the signal in the right way.

When SIGTERM is received, it stops consuming new messages, but the consumer process will not terminate.

How to reproduce the bug

RabbitMQ server is up and running and the queue my-queue is always empty.

Step 1 - Check if consumer is running

$ ps -ef | grep "rabbitmq-cli-consumer" | grep -v grep | wc -l
0

No consumer is running here.

Step 2 - Start consumer

/var/www/project/bin/rabbitmq-cli-consumer -e /var/www/project/bin/my-executable -q my-queue -c /var/www/project/consumer.conf -i -o

Step 3 - Send SIGTERM to the consumer process

In a new terminal window:

ps -ef | grep "rabbitmq-cli-consumer" | grep -v grep | awk '{print $1}' | xargs --no-run-if-empty kill
sleep 10

Step 4 - Check if consumer is running

$ ps -ef | grep "rabbitmq-cli-consumer" | grep -v grep | wc -l
1

The consumer is running here.

If I do the previous procedure with consumer version 2.2.0 the step 4 result is 0, as expected.

salvatorecordiano commented 5 years ago

@corvus-ch are you able to reproduce my bug?

corvus-ch commented 5 years ago

@salvatorecordiano Yes, I am. To be honest, I am a little bit stuck on this one, both time and idea wise. It is definitely a side effect introduced by #47 which was not caught by any test. And on first sight, it looks like some kind of chicken and egg problem.

I will have a closer look as soon as I can spare the time.

corvus-ch commented 5 years ago

I found something that looks like a really simple solution to this problem.

diff --git a/consumer/consumer.go b/consumer/consumer.go
index e9ba9b3..7f1389e 100644
--- a/consumer/consumer.go
+++ b/consumer/consumer.go
@@ -106,7 +106,9 @@ func (c *Consumer) consume(msgs <-chan amqp.Delivery, done chan error) {
            return
        }
    }
-   c.wg.Wait()
+   if !c.canceled {
+       c.wg.Wait()
+   }
    done <- nil
 }

But before I will break things again I prefer to do some thorough testing.

salvatorecordiano commented 5 years ago

Well done @corvus-ch! I suggest you to delete the last release on GitHub, so we don't break things to other users. Furthermore, if you publish an alpha release i will do a test for you.

corvus-ch commented 5 years ago

A pre release build is available at https://github.com/corvus-ch/rabbitmq-cli-consumer/releases/tag/2.3.2-alpha1. Work on a fix is happening at #52.

salvatorecordiano commented 5 years ago

@corvus-ch I tested your build in my local environment and it works in the right way.

corvus-ch commented 5 years ago

I am glad the problem got fixed. Meanwhile, I am still looking for a way to get this appropriately covered by automated tests.

corvus-ch commented 5 years ago

I did some refactoring to remove a troublesome design decision I previously made. I am now confident, the issue in question is captured by tests. There is a new pre release build available: https://github.com/corvus-ch/rabbitmq-cli-consumer/releases/tag/2.3.2-alpha2.

@salvatorecordiano Do you mind to give the alpha2 a try?

If that one works for you. I will finish this up and make a new regular release.

salvatorecordiano commented 5 years ago

@corvus-ch I tried it and it works in the right way

corvus-ch commented 5 years ago

New release is available: https://github.com/corvus-ch/rabbitmq-cli-consumer/releases/tag/2.3.2.