BlackZork / mqmgateway

MQTT gateway for modbus networks
GNU Affero General Public License v3.0
42 stars 18 forks source link

modmqttd does not shutdown on SIGTERM in Alpine Linux docker container #33

Closed git-developer closed 6 months ago

git-developer commented 8 months ago

Observed behavior

When a SIGTERM is sent to modmqttd, the process seems to be shutting down properly:

# pkill modmqttd
2024-Jan-13 13:51:29.796627: [INFO]     Got SIGTERM, exiting....
/etc/modmqttd # 2024-Jan-13 13:51:29.796924: [INFO]     Stopping modbus clients
2024-Jan-13 13:51:29.797439: [INFO]     Publishing avaiability status 0 for all registers
2024-Jan-13 13:51:29.798028: [INFO]     Disconnecting from mqtt broker
2024-Jan-13 13:51:29.798496: [INFO]     Disconnected from mqtt broker, code:No error.
2024-Jan-13 13:51:29.798662: [INFO]     Stopping mosquitto message loop

But after that, the process keeps running:

# ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /sbin/docker-init -- /bin/sh
    8 root      0:00 /bin/sh
   23 root      0:00 modmqttd
   27 root      0:00 ps aux

A second pkill finally terminates the process.

Expected behavior

I expect that modmqttd is terminated when a SIGTERM is sent.

Steps to reproduce

This behavior can be reproduced with the following command (where some-demo-config.yaml contains the config):

$ docker run --rm -ti --entrypoint /bin/sh -v "$PWD/some-demo-config.yaml:/etc/modmqttd/config.yaml" --workdir /etc/modmqttd ckware/mqmgateway:v1.2.0

Within the container:

# modmqttd &
# pkill modmqttd
# ps aux

Remarks

When a SIGHUP is sent instead of a SIGTERM (pkill -HUP modmqttd), the process terminates immediately with no log output (looks like no cleanup is done then).

BlackZork commented 8 months ago

Works as expected on master branch without docker. My "production" instance does not have this problem too.

Looks like modbus thread are finished properly but there is a problem with mqtt shutdown. I suspect a race condition between ModMqtt::notifyQueues and ModMqtt::waitForQueues.

Does adding 5 sec wait in ModMqtt::notifyQueues() helps?

BlackZork commented 8 months ago

I spent some time trying to reproduce this problem without any success. You may try to build debug version and if deadlock occurs then connect debugger and see where it is stuck.

Unfortunately there is a chance that in debug build this will not happen at all :(

git-developer commented 8 months ago

Thanks for investigating this issue, I really appreciate that.

It's helpful to know that you can't reproduce the issue in your environment. That could mean that the problem might be introduced by my build and/or runtime environment (Docker). I will try to find out whether the problem is related to some of the container's base components (Linux Alpine / musl).

git-developer commented 8 months ago

The problem does not occur when Debian is used as base image instead of Alpine, so the problem is probably not caused by the code of this project.

BlackZork commented 7 months ago

I replaced while loop with predicate in waitForQueues in 1.4.0. Still no idea what the problem is, so it may or may not fix it :-)

git-developer commented 7 months ago

Thanks for releasing 1.4.0, I just tried it. Behavior is unchanged. The last message in the log is

Stopping mosquitto message loop

But only on Alpine (based on musl), no problem on Debian (based on glibc). Seems to be related to the queue lock. I don't think this is something we can fix here.

BlackZork commented 7 months ago

Could you share Dockerfile for alpine build so I can run it from within source code tree?

git-developer commented 7 months ago

Of course, the Dockerfile is from the Docker PR.

BlackZork commented 7 months ago

Thanks. This is a problem with mosquitto. I reported it as Issue 2981. Before thinking about workarounds I plan to wait for a while and see what will happen with this issue.