Open caevv opened 5 years ago
Hi there,
I am having the same issue with mosquitto running on 1.6.3 and bridge on 1.6.3 or 1.5.8 after a client lost connection 1564044750: Client dev-someid has exceeded timeout, disconnecting.
Logs from the broker are very interesting as they show that messages are actually received but no PUBREL
is sent back! Which is the case in working conditions. After a restart of my mosquitto client PUBREL
are sent again and everything comes back to normal.
Expected logs (From a working state)
1564051594: New connection from x.x.x.x on port 8883.
1564051594: New bridge connected from x.x.x.x as dev-someid (p2, c0, k180, u'bridge.host.com').
1564051594: Will message specified (1 bytes) (r1, q1).
1564051594: $SYS/broker/connection/dev-someid/state
1564051594: Sending CONNACK to dev-someid (1, 0)
1564051594: Received PUBLISH from dev-someid (d0, q1, r1, m1, '$SYS/broker/connection/dev-someid/state', ... (1 bytes))
1564051594: Sending PUBACK to dev-someid (m1, rc0)
1564051594: Sending PUBLISH to auto-DA5F9AAD-5070-C704-6D22-89ECCBE38A0B (d0, q0, r0, m0, '$SYS/broker/connection/dev-someid/state', ... (1 bytes))
1564051594: Sending PUBREC to dev-someid (m1870, rc0)
1564051594: Received UNSUBSCRIBE from dev-someid
1564051594: some/topic/dev-someid/#
1564051594: dev-someid some/topic/dev-someid/#
1564051594: Sending UNSUBACK to dev-someid
1564051594: Received PUBREL from dev-someid (Mid: 1870)
1564051594: Sending PUBCOMP to dev-someid (m1870)
1564051594: Sending PUBREC to dev-someid (m1871, rc0)
Observed logs (No PUBREC
is send)
1564048435: New connection from x.x.x.x on port 8883.
1564048435: New bridge connected from x.x.x.x as dev-someid (p2, c0, k180, u'bridge.host.com').
1564048435: Will message specified (1 bytes) (r1, q1).
1564048435: $SYS/broker/connection/dev-someid/state
1564048435: Sending CONNACK to dev-someid (1, 0)
1564048435: Received PUBLISH from dev-someid (d0, q1, r1, m1973, '$SYS/broker/connection/dev-someid/state', ... (1 bytes))
1564048435: Sending PUBACK to dev-someid (m1973, rc0)
1564048435: Sending PUBLISH to auto-DA5F9AAD-5070-C704-6D22-89ECCBE38A0B (d0, q0, r0, m0, '$SYS/broker/connection/dev-someid/state', ... (1 bytes))
1564048435: Received UNSUBSCRIBE from dev-someid
1564048435: some/topic/dev-someid/#
1564048435: dev-someid some/topic/dev-someid/#
1564048435: Sending UNSUBACK to dev-someid
1564048435: Received PUBLISH from dev-someid (d1, q2, r0, m1870, 'some/topic/dev-someid/my/topic', ... (79 bytes))
1564048435: Received PUBLISH from dev-someid (d1, q2, r0, m1871, 'some/topic/dev-someid/another/topic', ... (153 bytes))
1564048442: Received PUBLISH from dev-someid (d0, q2, r0, m1977, 'some/topic/dev-someid/not/the/same/topic', ... (74 bytes))
Thanks for the second report @minjatJ, I haven't yet reproduced this. Are you also running on Docker?
Thanks for the different reports both of you, I'm struggling to reproduce this. Do you have any more hints on what you're seeing?
Hi @ralight thanks for looking into this. I am running on Docker as well. Unfortunately the issue is very intermittent (Seems to be depending a lot on when the connection is dropped and comes back). I can't manage to reproduce it consistently. I am still trying to figure out what is the exact scenario. I'll post updates if I have any through the day.
I can reproduce it:
Having an existing and working bridge, when restarting the broker on "bim", connection from "denise" dies and is not able to recover without restarting the Mosquitto instance on "denise".
Logs from denise around the time of restart of "bim":
1613997839: Client local.denise closed its connection.
1613997845: Connecting bridge bim (bim.fritz.box:1883)
1613997845: Client local.denise disconnected: Broken pipe.
1613997851: Connecting bridge bim (bim.fritz.box:1883)
1613997917: Connecting bridge bim (bim.fritz.box:1883)
1613997983: Connecting bridge bim (bim.fritz.box:1883)
1613998049: Connecting bridge bim (bim.fritz.box:1883)
1613998115: Connecting bridge bim (bim.fritz.box:188
Logs from bim after having it restarted:
1613997849: mosquitto version 1.6.9 starting
1613997849: Config loaded from /etc/mosquitto/mosquitto.conf.
1613997849: Opening ipv4 listen socket on port 1883.
1613997849: Opening ipv6 listen socket on port 1883.
1613997851: New connection from 192.168.33.51 on port 1883.
1613997854: New connection from 192.168.33.31 on port 1883.
1613997911: Socket error on client <unknown>, disconnecting.
1613997917: New connection from 192.168.33.51 on port 1883.
1613997977: Socket error on client <unknown>, disconnecting.
1613997983: New connection from 192.168.33.51 on port 1883.
1613998043: Socket error on client <unknown>, disconnecting.
1613998049: New connection from 192.168.33.51 on port 1883.
1613998109: Socket error on client <unknown>, disconnecting.
1613998115: New connection from 192.168.33.51 on port 1883.
@m0wlheld I've just tried that with a fresh remote ubuntu 20.04 VM and can't reproduce it. Is there anything special in your configuration files I should know about?
Nothing special (I think), I'll share.
/etc/mosquitto/mosquitto.conf on Ubuntu host "bim.fritz.box":
pid_file /var/run/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
log_dest file /var/log/mosquitto/mosquitto.log
include_dir /etc/mosquitto/conf.d
/etc/mosquitto/conf.d/auth.conf on Ubuntu host "bim.fritz.box":
allow_anonymous false
password_file /etc/mosquitto/passwd
/mosquitto/config/mosquitto.conf for container instance on Raspbian host "denise.fritz.box":
listener 1883 0.0.0.0
persistence false
allow_anonymous false
password_file /mosquitto/config/mosquitto.passwd
connection bim
address bim.fritz.box:1883
remote_clientid denise
remote_password ***************
remote_username denise
Thanks - I agree it doesn't look much special, but it's always nice to have exactly the same conditions to work with.
I observe the same problem here: The setup is a bridge connection:
Procedure:
1614955646: mosquitto version 2.0.7 starting
1614955646: Config loaded from /mosquitto/config/mosquitto.conf.
1614955646: Opening ipv4 listen socket on port 1883.
1614955646: mosquitto version 2.0.7 running
1614955677: New connection from 172.20.0.1:33862 on port 1883.
1614955677: New client connected from 172.20.0.1:33862 as A-bridge (p5, c1, k60, u'A').
logs B:
1614956548: mosquitto version 2.0.7 starting
1614956548: Config loaded from /mosquitto/config/mosquitto.conf.
1614956548: Opening ipv4 listen socket on port 1883.
1614956548: Connecting bridge A-bridge (<ip-A>:1883)
1614956924: mosquitto version 2.0.7 starting
1614956924: Config loaded from /mosquitto/config/mosquitto.conf.
1614956924: Opening ipv4 listen socket on port 1883.
1614956924: mosquitto version 2.0.7 running
1614956936: New connection from 172.20.0.1:33936 on port 1883.
1614956996: Client <unknown> closed its connection.
1614957017: New connection from 172.20.0.1:33940 on port 1883.
1614957077: Client <unknown> closed its connection.
logs B:
1614956753: mosquitto version 2.0.7 starting
1614956753: Config loaded from /mosquitto/config/mosquitto.conf.
1614956753: Opening ipv4 listen socket on port 1883.
1614956753: Connecting bridge A-bridge (<ip-A>:1883)
1614956753: mosquitto version 2.0.7 running
1614956754: Client local.A-bridge disconnected: Broken pipe.
1614956774: Connecting bridge A-bridge (<ip-A>:1883)
1614956855: Connecting bridge A-bridge (<ip-A>:1883)
1614956936: Connecting bridge A-bridge (<ip-A>:1883)
A: mosquitto.conf
:
log_dest stdout
log_type error
log_type warning
log_type notice
log_type information
log_type subscribe
log_type unsubscribe
log_type websockets
password_file /run/secrets/passwd
listener 1883 0.0.0.0
allow_anonymous false
B: mosquitto.conf
persistence true
persistence_location /mosquitto/data/
log_dest stdout
log_type error
log_type warning
log_type notice
log_type information
log_type subscribe
log_type unsubscribe
password_file /mosquitto/config/passwd
listener 1883 0.0.0.0
allow_anonymous true
connection A-bridge
bridge_protocol_version mqttv50
address <ip-A>:1883
topic # both
remote_username A
remote_password *******
remote_clientid A-bridge
restart_timeout 20 600
notifications true
cleansession true
start_type automatic
try_private true
I see exactly same problem as Toschoch with Mosquitto 2.07 installed on Windows 10. Anybody know how to make a quick workaround?
I see exactly same problem as Toschoch with Mosquitto 2.07 installed on Windows 10. Anybody know how to make a quick workaround?
Ubuntu 20.04 mosquitto version 2.0.8
I've found a workaround using docker-compose healthcheck.
a couple of steps
version: "3.8"
services:
autoheal:
image: "willfarrell/autoheal:latest"
tty: true
container_name: "autoheal"
restart: always
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
mosquitto:
image: "eclipse-mosquitto"
ports:
- "1883:1883/tcp"
- "9001:9001/tcp"
volumes:
- "/host/mosquitto/config:/mosquitto/config"
- "/host/mosquitto/data:/mosquitto/data"
- "/host/mosquitto/log:/mosquitto/log"
labels:
- "autoheal=true"
healthcheck:
test: ["CMD", "/mosquitto/config/hc.sh"]
interval: 5s
timeout: 10s
retries: 3
restart: unless-stopped
/mosquitto/config/hc.sh
#!/bin/sh
function_to_fork() {
HC=$( mosquitto_sub -h 127.0.0.1 -p 1883 -t "test/topic_hc" -u hass -P "MeGaPa$$wD" -C 1 )
if test "$HC" != 'alive'; then
echo "> $HC result"
exit 1
fi
}
function_to_fork &
mosquitto_pub -h 10.0.0.1 -p 1883 -t 'test/topic_hc' -m 'alive' -u hass -P 'MeGaPa$$wD'
wait -n
I had this issue in 2.0.10, but in 2.0.11 it seems fine. (using docker)
I'm still having the issue on 2.0.14, where I've got three brokers. When restarting everything (docker compose), two brokers start up at nearly the same time, which means that one isn't ready when the other is trying to connect. The other then never tries again. A workaround shouldn't be needed, right?
Here's how I've set it up: Broker A has a bridge set up to broker B Broker B has a bridge set up to broker C
If broker A restarts, it'll reconnect to broker B. If broker B restarts, it'll reconnect to broker C, but broker A will state Client broker_b closed its connection.
, and it never tries again. The same goes for broker C restarting: broker B will never try again.
In all mosquitto.conf I have the lines:
keepalive_interval 60
start_type automatic
restart_timeout 10 180
Why does it not reconnect?
Just noticed something odd.
I actually have a fourth broker, which I don't really use. This broker is also connected as a bridge, and this one DOES reconnect no matter what. Brokers A, B, and C are all running as docker containers on RPi's (one on a RPi Zero W), and none of them attempt to reconnect. Broker D is running as a docker container on Debian.
Can this explain the issue?
Hi,
I have a similar problem.
I created a Broker on a Raspberry PI and connected it to my main MQTT.
The device connects to the Raspberry Broker and works without problems.
However, the Broker disconnects from the main MQTT after some time and no longer reconnects.
When I restart it, it works again until it disconnects again.
Does anyone know why this is happening?
This is a very strange behavior, communication between MQTT servers works smoothly in both directions until disconnected.
17:55:07:: 1649699707: New bridge connected from 172.17.0.1:40846 as bridge01 (p2, c1, k15, u'hassio').
17:55:43:: 1649699743: Client bridge01 has exceeded timeout, disconnecting.
Is it necessary to set an interval for the check or turn it off completely or something similar?
Thank you in advance for any answer, M.
@michaelfanta; the timeout should be caught by keepalive_interval 60
, which you may need to reduce to 30 or something. However, the restart_timeout 10 180
should make it reconnect as well.
I just tried that and it still disconnects from the main MQTT server. Here I attach my config file. No need to set protocol type or something? It's weird that it always disconnects after a while.
listener 1883 0.0.0.0
persistence true
persistence_location /mosquitto/data/
allow_anonymous false
password_file /mosquitto/data/passwd
log_dest file /mosquitto/log/mosquitto.log
connection bridge01
address xxx.xxx.xxx.xxx:1883
topic # both 0
cleansession false
remote_username test
remote_password test
remote_clientid bridge01
restart_timeout 10 180
keepalive_interval 30
notifications true
start_type automatic
try_private true
I'm testing it on a Raspberry. I have there connected to MQTT one SonOff MINI (tasmota).
If I change the states of the switch, the connection seems to be stable. However, when the status stops changing, the MQTT Bridge disconnects from the main MQTT.
Shouldn't the MQTT Bridge somehow send regular data to the main MQTT to avoid disconnection?
I also tried to set "start_type" to "lazy" but without effect.
EDIT I also watched that when "keepalive_interval" was set to "30", it disconnected after 30 seconds. Therefore, I set this interval to the maximum value of 65535 and has not disconnected yet.
I also watched that the date and time did not display correctly in the log file. The time is displayed as: Sun Jan 100 4174038:2123955838:00 1900
So I set the value "privileged" to "true" in "docker-compose.yml" and the time and date are displayed correctly. Do you think this could be related to a previous disconnection?
Privileged containers is almost never a good idea, especially not when used to circumvent another bug. For system time in the container, you can best do this: ` volumes:
However, the fact that it doesn't reconnect after (wrongly) losing the connection is the same bug I am experiencing.
Thank you so much, this is how it works without problems :)
As for MQTT, he probably didn't mind that there was no communication. Now that sensor states are being sent, the MQTT does not disconnect.
But as you say, when a connection is lost, it doesn't reconnect, which is weird.
I'm trying to solve something similar with Tasmot FW. It connects to the MQTT and when the MQTT is Offline for some time, the Tasmota device no longer logs on. Each time a connection attempt fails, the time is shifted until it finally stops trying to connect, which is incorrect.
I'm still trying to find some information about why "restart_timeout" isn't responding .. I was hoping that it would be enough to set "start_type" to "lazy", but even that doesn't affect me.
I even tried to set "threshold = 1" to respond to each message, but it will never reconnect.
For now, I'm dealing with a Python-based script, when the mqtt connection to the broker is lost, the MQTT on RPI restarts to reconnect, but it's not really a permanent solution.
I have the same issue with 2.0.11 on both bridge and broker (Docker). I notice that sometimes when the bridge starts up and tries to connect the broker, it doesn't receive the CONNACK (even though I can see the broker sending the CONNACK in its logs, another problem I'm looking into). After that the keep alive fires every 30 seconds and the ping requests work fine on both ends, but the bridge never publishes a message to the broker after that and never tries to reconnect. When it does connect ok everything works perfectly. It's almost like it's waiting on the CONNACK forever. How can I get the bridge to retry the connection when it doesn't receive a CONNACK?
Reading the mosquitto conf manual page here I see with the restart_timeout setting the default is to initially retry after 5 seconds and then increase up to maximum retry period of 30 seconds. So in theory I should see a reconnect attempt every 30 seconds (without configuring the restart_timeout) but I don't. I do see a keep alive (which is successful) and also happens to be set to 30 seconds. Is it possible the successful keep alive overrides the reconnect attempt and leaves the bridge in a permanent "happily disconnected" state?
"Happily disconnected" does seem appropriate in this case, as the broker never appears to retry connecting. I'm starting to suspect there are multiple problems here:
keep_alive
CONNACK
is worrying, and may also play a roleLet's say it does reconnect after five seconds, and it is indeed due to the keep_alive
; this would mean that the broker drops messages for five seconds every time it tries to keep the connection alive.
I experienced the problem when restarting brokers, where the order in which they are restarted is dependent on which broker initiates the connection, and then only on a RPi Zero W.
I've recently found this to be an issue with a friends' setup as well. He's also using a rPi Zero W which doesn't reconnect.
I believe we may have a similar issue happening but it's intermittent (meaning if we restart mosquitto it may or may not happen) but it does seem roughly 50% of the time it happens.
I've noticed that for us it appears as though this log statement is always present when it happens:
2022-08-17T21:40:34Z mosquitto[24606]: 1660772434: Expiring client brXXXXXXX due to timeout.
Confirming that 1.5.8 does not exhibit the issue (just tested 5 times and would have expected to see it once).
1.6.15 did not exhibit the issue either. Actually I'm not sure this is the same issue for us. I think this broke for us between 2.0.14 and 2.0.15 when this (guessing):
/* FIXME - we should expire clients here if they have exceeded their time */
was replaced by this:
session_expiry__add_from_persistence(context, chunk.F.session_expiry_time);
Sorry for the noise on this issue.
Reading the mosquitto conf manual page here I see with the restart_timeout setting the default is to initially retry after 5 seconds and then increase up to maximum retry period of 30 seconds. So in theory I should see a reconnect attempt every 30 seconds (without configuring the restart_timeout) but I don't. I do see a keep alive (which is successful) and also happens to be set to 30 seconds. Is it possible the successful keep alive overrides the reconnect attempt and leaves the bridge in a permanent "happily disconnected" state?
/* FIXME - we should expire clients here if they have exceeded their time */
was replaced by this:
session_expiry__add_from_persistence(context, chunk.F.session_expiry_time);
I'd say it's likely that the problem originates from something like this. Could we somehow get this issue 'bumped'?
I'm not sure if this is related or not, but on 2.0.15, we're experiencing the same issue that @everactivetim described. We have a number of bridged Mosquitto brokers, and in many cases (maybe close to 50%) the message
Expiring client local.<bridge-name> due to timeout.
appears when Mosquitto is starting up. There is no actual connection timeout at play here - which happens within a second of service start. The bridge is then never actually created.
We're reverting to 2.0.14 to see if the issue is present there also; it may be a regression due to https://github.com/eclipse/mosquitto/commit/c99502a2567bf9147b3d811cb17e38ad51f08660 as @everactivetim suggested. The most relevant parts of the configuration are (probably)
# General
persistence true
# Bridge
cleansession false
start_type automatic
restart_timeout 10 120
This may be a different problem, just like the problem mentioned by @everactivetim , but they all appear related.
What device/architecture are you using? I'm able to reproduce the problem on a(ny) RPi Zero W, but not on any other device.
Yes after further investigation I do think it's a separate issue, caused by a regression in 2.0.15. Reverting to 2.0.14 appears to fix the Expiring client due to timeout
issue I was experiencing - I'll create a ticket for this. This is on Raspberry Pi 3B+ / armv7l architecture.
5 years age! problem still here!
Should be fixed with 2.0.16, actually. This issue simply wasn't closed automatically.
This issue is NOT fixed. We're seeing lost bridge connections on 2.0.18.
Expiring client cloud-bridging-client due to timeout.
mosquitto.conf:
connection cloud
address my-cloud-address:8883
bridge_cafile /certs/ca.pem
bridge_protocol_version mqttv50
notifications_local_only true
local_clientid cloud-bridging-client
remote_clientid the-client
remote_password the-password
topic some/topic both 1 cloud/ ""
That's true! We use 2.0.18 & the issue is still present:
Expiring client local.*** due to timeout.
user root
persistence true
persistent_client_expiration 1d
persistence_location /mosquitto/data/
log_dest file /mosquitto/log/mosquitto.log
listener {LOCAL_PORT} {LOCAL_HOST}
allow_anonymous true
connection bridge-name
address {REMOTE_HOST}:{REMOTE_PORT}
bridge_protocol_version mqttv50
restart_timeout 1
remote_clientid other-name
i'm running into this problem again and again as well. Restarting the broker is a bad workaround, but apparently the only one available.
After upgrading to 1.6.3 from 1.5.x I found that when the bridge connection is lost the bridge won't recover when the connection is re-established.
Only solution has been restarting mosquitto.
Mosquitto running on docker.
mosquitto.conf
Bridge name:
bridge-01
Connection lost at1562929130
Connection re-established at1562929523