Closed Noschvie closed 8 months ago
See if this helps.
Checked the wiki, but no success. Did an installation from sratch, RPi 4, Raspberry Pi OS with desktop, 64 bit, using PiBuilder (works perfect, thanks) IOTstack with portainer, mosquitto and node-red Using a simple Node-Red flow for publishing and subscribing "iotstack/mosquitto/node-red" works. But subscribing "iotstack/mosquitto/healthcheck" get no respond.
The script iotstack_healthcheck.sh seems not to be started / running.
docker exec mosquitto iotstack_healthcheck.sh
leads to a result in Node-Red
9.10.2023, 01:38:23[node: iotstack/mosquitto/healthcheck](http://hadersdorf:1880/#) iotstack/mosquitto/healthcheck : msg.payload : string[28] "Sun Oct 8 23:38:23 UTC 2023"
I think it's working "as advertised".
I'm about to demonstrate two options:
---
version: '3.6'
networks:
default:
driver: bridge
ipam:
driver: default
services:
mosquitto:
container_name: mosquitto
build:
context: ./.templates/mosquitto/.
args:
- MOSQUITTO_BASE=eclipse-mosquitto:latest
restart: unless-stopped
environment:
- TZ=${TZ:-Etc/UTC}
ports:
- "1883:1883"
volumes:
- ./volumes/mosquitto/config:/mosquitto/config
- ./volumes/mosquitto/data:/mosquitto/data
- ./volumes/mosquitto/log:/mosquitto/log
- ./volumes/mosquitto/pwfile:/mosquitto/pwfile
Tests:
Start the container:
$ UP
[+] Running 2/2
✔ Network iotstack_default Created 0.2s
✔ Container mosquitto Started 0.1s
Give health check time to go "healthy" and then report status:
$ sleep 30 ; DPS
NAMES CREATED STATUS SIZE
mosquitto 36 seconds ago Up 34 seconds (healthy) 0B (virtual 19.1MB)
Prove the container will pass messages bi-directionally:
$ mosquitto_sub -v -h 127.0.0.1 -t "hello" -F "%I %t %p" -C 1 &
[1] 805891
$ mosquitto_pub -h 127.0.0.1 -t "hello" -m "test $(date)"
2023-10-09T13:04:01+1100 hello test Mon 09 Oct 2023 01:04:01 PM AEDT
[1]+ Done mosquitto_sub -v -h 127.0.0.1 -t "hello" -F "%I %t %p" -C 1
In other words, there is no security. That's the default.
Terminate the container:
$ DOWN
[+] Running 2/2
✔ Container mosquitto Removed 0.4s
✔ Network iotstack_default Removed 0.3s
Add these lines to the mosquitto service definition:
healthcheck:
disable: true
Test:
$ tail -4 docker-compose.yml
- ./volumes/mosquitto/pwfile:/mosquitto/pwfile
healthcheck:
disable: true
$ UP
[+] Running 2/2
✔ Network iotstack_default Created 0.2s
✔ Container mosquitto Started 0.1s
$ DPS
NAMES CREATED STATUS SIZE
mosquitto 3 seconds ago Up 2 seconds 0B (virtual 19.1MB)
$ DOWN
[+] Running 2/2
✔ Container mosquitto Removed 0.4s
✔ Network iotstack_default Removed 0.3s
Note how references to health check have been removed from the STATUS column.
Return to baseline (ie remove two lines added in option 1).
Start Mosquitto:
$ UP mosquitto
[+] Running 2/2
✔ Network iotstack_default Created 0.2s
✔ Container mosquitto Started 0.1s
Define username and password:
$ docker exec mosquitto mosquitto_passwd -b /mosquitto/pwfile/pwfile someuser somepassword
Warning: File /mosquitto/pwfile/pwfile has world readable permissions. Future versions will refuse to load this file.
To fix this, use `chmod 0700 /mosquitto/pwfile/pwfile`.Warning: File /mosquitto/pwfile/pwfile owner is not root. Future versions will refuse to load this file.To fix this, use `chown root /mosquitto/pwfile/pwfile`.Warning: File /mosquitto/pwfile/pwfile group is not root. Future versions will refuse to load this file.
Note to self - that will have to be fixed in the IOTstack template structure
Fix the problem reported:
$ sudo chmod 700 ./volumes/mosquitto/pwfile/pwfile
$ sudo chown root:root ./volumes/mosquitto/pwfile/pwfile
$ ls -l ./volumes/mosquitto/pwfile/pwfile
-rwx------ 1 root root 122 Oct 9 12:36 ./volumes/mosquitto/pwfile/pwfile
Provide credentials to health-check script by adding these environment variables:
- HEALTHCHECK_USER=someuser
- HEALTHCHECK_PASSWORD=somepassword
Proof:
$ grep -A 4 "environment:" docker-compose.yml
environment:
- TZ=${TZ:-Etc/UTC}
- HEALTHCHECK_USER=someuser
- HEALTHCHECK_PASSWORD=somepassword
ports:
Enable security:
$ sudo sed \
-i.bak \
-e 's/^#password_file/password_file/' \
-e 's/^allow_anonymous true/allow_anonymous false/' \
./volumes/mosquitto/config/mosquitto.conf
Proof:
$ diff ./volumes/mosquitto/config/mosquitto.conf.bak ./volumes/mosquitto/config/mosquitto.conf
32,33c32,33
< #password_file /mosquitto/pwfile/pwfile
< allow_anonymous true
---
> password_file /mosquitto/pwfile/pwfile
> allow_anonymous false
UP the container. It's running but the UP will cause docker-compose to notice the environment variables have changed and docker-compose will re-create the container, which will then pick up the altered config and password file.
$ UP
[+] Running 1/1
✔ Container mosquitto Started 0.4s
$ sleep 30 ; DPS
NAMES CREATED STATUS SIZE
mosquitto 36 seconds ago Up 35 seconds (healthy) 0B (virtual 19.1MB)
Prove that the container is enforcing security
$ mosquitto_sub -v -h 127.0.0.1 -t "#" -F "%I %t %p" -C 1
Connection error: Connection Refused: not authorised.
$ mosquitto_pub -h 127.0.0.1 -t "hello" -m "test $(date)"
Connection error: Connection Refused: not authorised.
Error: The connection was refused.
Screen shot is with changes proposed by #732 and #733:
Working directory is ~/IOTstack
.
Erase persistent store.
UP the container.
Show logs - interesting lines are:
changed ownership of '/mosquitto/pwfile/pwfile' to 0:0
mode of '/mosquitto/pwfile/pwfile' changed to 0600 (rw-------)
Create a username and password. No errors returned.
Show username and hash made it into the password file.
Currently I'm testing MQTT without security, no username and password defined. I assume that the health check should publish the timestamp to "iotstack/mosquitto/healthcheck" using an interval of 30 seconds. Isn't it? If yes, I'm missing this periodic publish.
Wow! Well, I agree. And, what's more, now that I understand why it isn't working:
Now, granted, those dates are incredibly rubbery because when someone updates their local clone against GitHub, and when someone rebuilds their local container, are all unknowns. It's entirely possible that someone would still be running a container built between 2021-05-24 and 2022-04-06, in which case the health-check would all be working as originally intended.
It's really only someone who has built Mosquitto since 2022-04-06 who has a health-check that isn't actually working.
That's for a significant fraction of "not working" because, as I'm sure you'll point out, here we are in October 2023 and a Mosquitto container built today will happily report "(health: starting)" for the first 30 seconds, and then report "(healthy)".
But it is, as they say, being loose with the truth.
Before I dive into the intricacies, I'll declare that this is all my own work (both adding the health-check and then breaking it). Mea culpa on steroids. Doh!
Anyway, to go back to taws, the iotstack_healthcheck.sh
script gets added to the image by the Dockerfile:
# copy the health-check script into place
ENV HEALTHCHECK_SCRIPT "iotstack_healthcheck.sh"
COPY ${HEALTHCHECK_SCRIPT} /usr/local/bin/${HEALTHCHECK_SCRIPT}
The Dockerfile also sets up the health-check scaffolding:
# define the health check
HEALTHCHECK \
--start-period=30s \
--interval=30s \
--timeout=10s \
--retries=3 \
CMD ${HEALTHCHECK_SCRIPT} || exit 1
All by itself, that works (#350). But, then, the Grand Nitwit From Down-Under (ie me) comes along in #521 and decides to "be tidy" by cleaning-up "unused" environment variables:
# don't need these variables in the running container
ENV MOSQUITTO_BASE=
ENV HEALTHCHECK_SCRIPT=
ENV IOTSTACK_ENTRY_POINT=
The problem is that the mechanism that triggers the health-check script evaluates ${HEALTHCHECK_SCRIPT}
inside the container each time the health-check script is run. I've helpfully set that variable to null.
Which means it's the equivalent of executing:
$ sh -c ""
Which has a return code of zero.
Which Docker interprets as meaning "healthy".
So, the solution is to be a bit less tidy and just remove those four lines above, including where HEALTHCHECK_SCRIPT
is set to null, and then build the container again.
With that done:
Start the container from the freshly-built image:
$ UP mosquitto
[+] Running 2/2
✔ Network iotstack_default Created 0.2s
✔ Container mosquitto Started 0.1s
Start a background listener:
$ mosquitto_sub -v -h 127.0.0.1 -t "#" -F "%I %t %p" &
[1] 950861
which almost immediately reports:
2023-10-09T22:41:38+1100 iotstack/mosquitto/healthcheck Mon Oct 9 22:40:48 AEDT 2023
Show the various health-check stages, interspersed with another message received by the background listener:
$ DPS ; sleep 25 ; DPS
NAMES CREATED STATUS SIZE
mosquitto 12 seconds ago Up 11 seconds (health: starting) 0B (virtual 19.1MB)
2023-10-09T22:42:06+1100 iotstack/mosquitto/healthcheck Mon Oct 9 22:42:06 AEDT 2023
NAMES CREATED STATUS SIZE
mosquitto 37 seconds ago Up 36 seconds (healthy) 0B (virtual 19.1MB)
Wait a bit as more messages roll in at 30-second intervals:
$ 2023-10-09T22:42:36+1100 iotstack/mosquitto/healthcheck Mon Oct 9 22:42:36 AEDT 2023
2023-10-09T22:43:06+1100 iotstack/mosquitto/healthcheck Mon Oct 9 22:43:06 AEDT 2023
2023-10-09T22:43:37+1100 iotstack/mosquitto/healthcheck Mon Oct 9 22:43:37 AEDT 2023
2023-10-09T22:44:07+1100 iotstack/mosquitto/healthcheck Mon Oct 9 22:44:07 AEDT 2023
Clean up:
$ kill %1
[1]+ Done mosquitto_sub -v -h 127.0.0.1 -t "#" -F "%I %t %p"
Two more PRs on the way.
By the way, I now realise that I completely misunderstood your original post.
When you wrote:
Tests are done without providing credentials.
I thought you were telling me that you had set up a password scheme but the health-check was failing because the process running inside the container wasn't using any of your credentials.
That's why I then set about proving (to myself) that credentials could be passed to the health-check script via environment variables.
Sorry for going off on the wrong track.
Still, it did reveal the need to change how the pwfile
is set up so some good came of it.
Thank you very much! Now it's working as expected.
$ mosquitto_sub -v -h localhost -p 1883 -t "iotstack/mosquitto/healthcheck"
iotstack/mosquitto/healthcheck Mon Oct 9 14:45:22 CEST 2023
iotstack/mosquitto/healthcheck Mon Oct 9 14:45:52 CEST 2023
iotstack/mosquitto/healthcheck Mon Oct 9 14:46:22 CEST 2023
iotstack/mosquitto/healthcheck Mon Oct 9 14:46:53 CEST 2023
iotstack/mosquitto/healthcheck Mon Oct 9 14:47:23 CEST 2023
iotstack/mosquitto/healthcheck Mon Oct 9 14:47:53 CEST 2023
iotstack/mosquitto/healthcheck Mon Oct 9 14:48:23 CEST 2023
I got an email containing this:
By the way: would it be possible to get the timestamp using the timezone of the container instead of UTC ?
I assume you figured it out and deleted the question.
My question to you is, which method did you use, because there are two:
You can edit the service definition:
environment:
- TZ=${TZ:-Etc/UTC}
to be either (in my case):
environment:
- TZ=${TZ:-Australia/Sydney}
or:
environment:
- TZ=Australia/Sydney
You can leave the service definition alone:
environment:
- TZ=${TZ:-Etc/UTC}
and add your timezone to the .env
file:
$ cd ~/IOTstack
$ echo "TZ=$(cat /etc/timezone)" >> .env
$ docker-compose up -d
Method 1 works on a per-container basis. Method 2 works for all containers that define TZ=${TZ:-Etc/UTC}
.
Not every container supports TZ. In general, if the person who controls the Dockerfile includes the tzdata
package then the container has time-zone support; if it's omitted, you're SOL.
That's why the add-on Dockerfile for Mosquitto in the IOTstack template contains:
RUN apk update && apk add --no-cache rsync tzdata
It's not in the "official" image so we have to add it.
I assume you figured it out and deleted the question.
yes.
environment:
- TZ=${TZ:-Europa/Vienna}
added this to each service.
Will change it to your item 2 using the .env
file.
Because of Uptime-Kuma doesn't support regex for the "MQTT Success Message" it would be fine to get an environment parameter for the healthcheck payload. PUBLISH=$(date) What do you think? Thanks!
To save me some time (and so that I don't have to do a deep dive into Uptime-Kuma to understand it), can you please summarise what Uptime-Kuma can/can't do and what it actually needs to work.
Right now, just using $(date)
serves two goals:
That second one is not a trivial concern. A lot of things work differently in Alpine and that has tripped me up often enough to make me very wary. If your goal is to be able to assume that the availability of something like:
environment:
- HEALTHCHECK_PUBLISH=$(date)
would mean that you'd be able to pass a full set of options to date
running inside the container then I'd suggest that you first try this:
Outside the container, run:
$ date --help
and observe that you get over 100 lines of help text describing all manner of options.
Repeat the command inside the Mosquitto container:
$ docker exec mosquitto date --help
and observe a mere 20-odd lines of help text and far fewer options.
I'd rather not create the kind of maintenance problem where people get to complain that:
date
with this set of parameters clearly works outside the container yet when I pass the same parameters to Mosquitto, everything turns to custard!
What I'd rather do is figure out some mechanism that satisfies the two goals I mentioned above and also works with Uptime-Kuma.
Also, right now, the "message" parameter (aka "the payload") of the mosquitto_pub
is just whatever raw string comes back from the Alpine version of date
. There's no reason why it can't become a JSON string. Something like:
PUBLISH="{\"date\":\"$(date)\",\"uptime\":\"$(uptime)\"}"
That would get you a payload like:
{"date":"Wed Oct 11 08:57:41 AEDT 2023","uptime":" 08:57:41 up 3 days, 11:17, 0 users, load average: 0.16, 0.17, 0.17"}
Would that be useful?
Incidentally, the uptime
command run inside any container gets the uptime
of the host system:
$ uptime ; docker exec mosquitto uptime ; docker exec nodered uptime
09:00:03 up 3 days, 11:19, 1 user, load average: 0.72, 0.30, 0.22
09:00:03 up 3 days, 11:19, 0 users, load average: 0.72, 0.29, 0.21
09:00:03 up 3 days, 11:19, 0 users, load average: 0.72, 0.29, 0.21
I don't know whether that helps/hinders your quest.
Hi Phill Uptime-Kuma are just able to compare a const. string, no expressions / regex are supported. Therefor I have to configure for example
environment:
- HEALTHCHECK_PUBLISH="Mosquitto healthcheck"
So you're saying Uptime-Kuma can't deal with a string that varies, right?
So, instead of the date (which varies) you want a fixed string. Right?
If "yes" then that defeats the purpose of using $date
in the existing health-check script which is to ensure that the string is always different on each run.
The reason it needs to be different is because of the way the health-check script works. It publishes a retained message, and then subscribes to that topic for exactly one message. Because it's a retained message, it will persist until the next time the script runs and publishes a new retained message.
It also doesn't matter what else happens to just about anything in the meantime. Uptime/Kuma could stop and start. The Pi (or whatever) hosting Docker and the mosquitto container could reboot. The mosquitto container could go down and up. The container could be in a restart loop. All of that could happen multiple times but, each time mosquitto is ready for business (even if only for a few seconds), that retained message will always be sent to any subscriber. In short, far from giving you any assurance that mosquitto is working, it creates a false positive.
Having the payload vary is the only way to be certain, on run n+1, that the retained message actually came from run n+1 and isn't from run n, and is masking a problem (like the container being in a restart loop).
I'm not a great fan of retained messages so I did try writing the health-check script without it, by doing things in the opposite order: set up a background listener which would exit after the first message, then publish a non-retained message in the foreground, wait for the background process to finish, then retrieve what it received and do the compare. It just wouldn't work.
Bottom line: the answer to the question of "can we have a fixed string" is "no".
So let me turn the problem around.
Do I assume correctly that Uptime-Kuma simply subscribes to the topic and treats the simple arrival of a message within a particular period as evidence of health?
If "yes" then why not just publish your own message to Mosquitto?
Assume Uptime-Kuma is subscribing to the "/proof/of/concept" topic. If you just run:
$ mosquitto_pub -t "/proof/of/concept" -m "Mosquitto healthcheck"
then Uptime-Kuma will receive "Mosquitto healthcheck", right?
What reception proves is that the MQTT broker (the mosquitto container) is functioning properly. It has been able to receive the published message and distribute it to all registered subscribers. The container is, by definition, working (at least for the duration of the publish/subscribe cycle).
It's not a retained message so receiving the payload proves it was sent "recently".
To make that happen at 60-second intervals, just stitch it to a cron job:
* * * * * mosquitto_pub -t "/proof/of/concept" -m "Mosquitto healthcheck" 2>/dev/null
If you really want it more frequently (eg every 30 seconds) then write a short bash script. Something like this would do the job:
#!/usr/bin/env bash
while : ; do
mosquitto_pub -t "/proof/of/concept" -m "Mosquitto healthcheck" 2>/dev/null
sleep 30
done
Stick that in your ~/local/bin
with a name like run_uptime_kuma_for_mosquitto.sh
and launch it from the crontab at reboot time:
@reboot ./.local/bin/run_uptime_kuma_for_mosquitto.sh
The 2>/dev/null
will silence any errors that will be produced if the mosquitto container is down. Publishing operations will resume as soon as the container is up and functioning.
Does that help?
Hi Phill
thanks for your detailed explaination.
So you're saying Uptime-Kuma can't deal with a string that varies, right?
Yes
So, instead of the date (which varies) you want a fixed string. Right?
Yes
But it seems not to be a good idea to use and change the current Mosquitto healthcheck for Uptime-Kuma.
Will use the LWT Topic (Last Will and Testament) from a Tasmota device to check the healthiness. Therefor I will not touch the current Mosquitto healthcheck. Thanks!
The solution is very simple : configure the monitor at Uptime-Kume and let the input field "MQTT Success Message" empty. So only the message will be checked but no check to the payload of the message. Great! And simple, isn't it?
So that means you can use the health-check message then?
By the way, thanks for reporting this. Otherwise, I would never have realised anything was wrong. 🤦
Also, the multi-talented Mr @Slyke has just processed the pull request so the changes to the Dockerfile and the entry-point script are now live on GitHub. The basic trick is to do a git status
and then git restore «file»
anything changed in the ~/IOTstack/.templates/mosquitto
directory, and then a git pull
will work and you'll be up-to-date again.
Yes, the health-check message works. In case of missing health-check messages Uptime-Kuma reports an error (tested by stopping the Mosquitto container).
I opened Mosquitto issue 2923. Even though my misunderstanding of your original post led me down that particular rabbit hole, root ownership feels wrong (at least in the Docker context) and I can't see any reason why the pwfile
needs execute permission.
I'm hoping someone who knows at lot more about Mosquitto than I do will cast a knowledgable eye over that issue and either set me straight or agree and propose a fix.
Incidentally, I also realised that part of the reason we (IOTstack users deploying Mosquitto) see that "not owned by root" warning is because our template starts with an empty pwfile
owned by ID=1883. If you actually use mosquitto_passwd
to create a password file from scratch, it gets root ownership and mode 600. Then, on the next container restart the root ownership will be reset to 1883 and the "not owned by root" warnings will start up on subsequent runs of mosquitto_passwd
.
The reason we see the "world readable permissions" warning is a side-effect of Git which, according to the reading I've been doing, only lets you specify whether a file has the execute bit set or not. There seems to be no way to set mode 600 in the template structure on GitHub and have it persist all the way through.
Hello get the mosquitto container health check agent not running. container-health-check
the container is up and running and healthy.
mosquitto Up 7 minutes (healthy)
But how can I get the health check agent to be used by Uptime-Kuma?
mosquitto_sub -v -h localhost -p 1883 -t "iotstack/mosquitto/healthcheck" -F "%I %t %p"
doesn't respond. Tests are done without providing credentials. Any idea? Thanks!