eclipse-mosquitto / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
9.16k stars 2.41k forks source link

messages are no longer retained with 2.0.16+ (with a lot of reained topics?) #2887

Open BubuOT opened 1 year ago

BubuOT commented 1 year ago

Since upgrading to mosquitto 2.0.16 (and later 2.0.17) (via docker image) retained messages received by the broker are no longer handled as retained and not delivered to new subscribers.

Downgrading to 2.0.15 makes this work as expected again.

(Happy to add more details about our setup if required)

BubuOT commented 1 year ago

I tried to repliciate the problem in a test setup including another mosquitto as an mqtt brigde but so far was unable to replicate the problem. Will keep you updated.

stigvi commented 1 year ago

Downgrading to 2.0.15 makes this work as expected again.

Yes, I did the same after a lot of troubles with no retained messages with 2.0.17

BubuOT commented 1 year ago

@stigvi Good to know we are at least not alone with this problem. Did you find a way to reproduce this problem somehow?

pat1 commented 1 year ago

duplicated of #2806 and #2618

pat1 commented 1 year ago

related to #2785

BubuOT commented 1 year ago

@pat1

duplicated of https://github.com/eclipse/mosquitto/issues/2806 and https://github.com/eclipse/mosquitto/issues/2618

I'm not sure it is? We do not experience this problem here with 2.0.15, which is explicitly mentioned in the other issue(s) as affected.

ralight commented 1 year ago

@BubuOT I've been unable to reproduce this so far. Are you able to share any more details about the setup and what sorts of topics you are using?

clintkev251 commented 1 year ago

I've also seen this. Specifically I'm using mosquitto with Home Assistant and Zigbee2MQTT in this case. Previously, when Home Assistant was restarted, it would read device configurations set as retained messages from Zigbee2MQTT automatically. However on 2.0.17, it would seemingly never be able to receive these retained messages and as a result devices remained unavailable until a restart of Zigbee2MQTT to force it to republish all of the discovery topics at which point they would be successfully received by Home Assistant

I confirmed that rolling back to 2.0.15, this began working as expected again. Let me know if there's any other information that would be helpful in reproducing

BubuOT commented 1 year ago

@ralight I have so far also been unable to reproduce this outside our production environment :-/

Here's our mosquitto config though:

# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

# >> AUTHENTICATION

# clients must authenticate
#allow_anonymous false
#password_file /mosquitto/config/passwd

# access control list
acl_file /mosquitto/config/acl

# >> PERSISTANCE

# save in-memory database for persistance to disk
autosave_interval 60

# write persistance data to the disk instead of storing only in memory
persistence true
persistence_location /mosquitto/data/

# >> LOGGING

# include timestamp
log_timestamp true

# Possible types are: debug, error, warning, notice, information, subscribe, unsubscribe, websockets, none, all.
#log_type all
#log_type debug
log_type error
log_type warning
#log_type notice
#log_type information
#log_type subscribe
#log_type unsubscribe

# If set to true, the log will include entries when clients connect and disconnect.
connection_messages true

# >> LISTENER

# plain mqtt protocol
# bind to 1883 public since security access is handled
# through docker container isolation
listener 1883
require_certificate true
use_identity_as_username true
use_username_as_clientid true
certfile /mosquitto/config/mqtt.othermo.de.crt
cafile /mosquitto/config/ca.crt
keyfile /mosquitto/config/mqtt.othermo.de.key
# use CRL to prevent revoked clients to access
crlfile /mosquitto/config/crl.pem

# encrypted listener on 8883
listener 8883
require_certificate true
use_identity_as_username true
use_username_as_clientid true
certfile /mosquitto/config/mqtt.othermo.de.crt
cafile /mosquitto/config/ca.crt
keyfile /mosquitto/config/mqtt.othermo.de.key
# use CRL to prevent revoked clients to access
crlfile /mosquitto/config/crl.pem

The bridge config (not sure if relevant.) looks like this:

connection_messages false
connection upstream
address <DOMAIN>

bridge_protocol_version mqttv311
bridge_insecure false
bridge_cafile <filename>
bridge_certfile <filename>
bridge_keyfile<filename>

notifications true

notification_topic {DEVICEID}/status
<list of bridged topics; mix between in an out and qos 0 and 1>
remote_clientid {DEVICEID}
ralight commented 1 year ago

Ok, thank you. I will keep looking, and if you find a way to reproduce it please let me know.

simonclayton commented 1 year ago

We have this exact issue when upgrading mosquitto 2.0.15 to 2.0.18 on docker. I can replicate the issue with data "missing" on 2.0.16, 2.0.17 and 2.0.18 but switching back to the 2.0.15 docker image just by editing the docker-compose.yml file and restarting immediately "fixes" everything without needing to republish any data.

In case it helps, we only start to see the issue with increasing volumes of data. We have 2 top level topics with a number of nested topics and at ~30k topics published on 2.0.15 or 2.0.18, connecting with MQTT Explorer shows ~1400 topics as the complete data set.

Testing with only one top level topic with ~1100 subtopics, we get all of the data sent to the client correctly. It is only when we add a 2nd top level topic with ~28k subtopics that we start to see issues.

For reference - our topic structure looks like this

--- topic1 (~1100 subtopics in total)
 |       |- sub1
 |       |     |- sub-subtopics1..10
 |       |     |                |- data topics
 |       | 
 |       |- sub2
 |       |     |- sub-subtopics1..2
 |       |     |                |- data topics
 |       | 
 |- topic2 (~28k subtopics in total)
 |       |- sub1
 |       |     |- sub-subtopics1..10
 |       |     |                |- data topics
 |       | 
 |       |- sub2
 |       |     |- sub-subtopics1..2
 |       |     |                |- data topics

I have a test server that only has dockerized mosquitto on and no personal data that I can let a project dev have root access to if you want to see this happening.

BubuOT commented 1 year ago

FWIW, we also have a lot of retained topics (~10k) in production where we saw this issue. This might explain why I've never been able to reproduce this in any test setup :thinking:.

ZetaWaves commented 1 year ago

I am having the same issue. All devices come up as "Unavailable" on Home Assistant after an HA reboot. Is this being worked on? This is a critical bug as my entire network is down with no fix.

clintkev251 commented 1 year ago

Well you can temporarily fix it very easily by reverting to 2.0.15 where retained topics are working as expected

ZetaWaves commented 1 year ago

Well you can temporarily fix it very easily by reverting to 2.0.15 where retained topics are working as expected

Not possible. I’m using MQTT HA Addon. No way to go back in version unfortunately.

Soukyuu commented 1 year ago

I just restored to HA addon version 6.2.1, which still has mosquito 2.0.15 - backups are a life saver.

ZetaWaves commented 1 year ago

I just restored to HA addon version 6.2.1, which still has mosquito 2.0.15 - backups are a life saver.

I'm happy for you, but unfortunately, I wasn't aware of when the problem started until I had to reboot HA. So all my backups are useless because I've made changes since then and I can't go back especially since I'm not sure when I upgraded the MQTT broker add-on. So yeah - this completely breaks using any MQTT device for me. Network has been down for some time. I will likely drop MQTT at this point because the dev's don't seem to be making this a priority fix.

Soukyuu commented 1 year ago

@ZetaWaves: HA does partial backups before every update by default, are you sure you have none? Or have you disabled that?

WhimsySpoon commented 1 year ago

I have the same issue with HA Core 2023.10.1, the HA Add-on v6.3.1 (2.0.17) and Zigbee2MQTT. Several devices became Unknown, even though they had a state in Z2M.

Restoring my add-on back to the version with 2.0.15 has resolved the issue. I have ~1400 topics.

wardwolfram commented 1 year ago

I just restored to HA addon version 6.2.1, which still has mosquito 2.0.15 - backups are a life saver.

I issues a partial restore to MQTT broker 6.2.1. After the restore, I received the following message: image

How did you restore to 6.2.1 successfully? Thanks much in advance.

tomlut commented 1 year ago

That's something you should ask Home Assistant, not Mosqitto.

wardwolfram commented 1 year ago

That's something you should ask Home Assistant, not Mosqitto.

Yes, restoring Mosqitto from HA.

wardwolfram commented 1 year ago

I just restored to HA addon version 6.2.1, which still has mosquito 2.0.15 - backups are a life saver.

I issues a partial restore to MQTT broker 6.2.1. After the restore, I received the following message: image

How did you restore to 6.2.1 successfully? Thanks much in advance.

Solved.

  1. Stopped the Zigbee2MQTT add-on service
  2. Stopped the Mosquitto Broker add-on service
  3. Uninstalled the Mosquitto Broker add-on
  4. HA partial restore of the Mosquitto Broker add-on v6.2.1
  5. Started the 2 services up successfully.

I agree with Soukyuu... backups are indeed a life saver!

Alphaemef commented 1 year ago

anyone know if this has been fixed in the recent updates or if a fix is coming ? I am still sitting on 2.0.16 for the same reason.

tomlut commented 1 year ago

Yes there is a bugfix for this included. See:

https://mosquitto.org/blog/2023/09/version-2-0-18-released/

Alphaemef commented 1 year ago

Aha so this was actually #2893 Perfect, thanks!

Alphaemef commented 1 year ago

sigh nope... when I log on with MQTT explorer, none of the messages are retained. Back to 2.0.15 :/

ZetaWaves commented 1 year ago

It is not fixed in 2.0.18.

Still broken. Just tested.

ZetaWaves commented 1 year ago

Yes there is a bugfix for this included. See:

https://mosquitto.org/blog/2023/09/version-2-0-18-released/

It is not fixed in this release. Actually - it's worse in this release than before.

tomlut commented 1 year ago

Working for me.

Alphaemef commented 1 year ago

It may be an issue with the MQTT broker addon for HA. I have around 30 clients, each with around 100 entities. And not a single message is retained, despite several reboots etc. Retained messages however work with version 2.0.15 (Addon version 6.21).

Anyone know who maintains the addon ?

tomlut commented 1 year ago

Frenk. You can have a chat on discord here: https://discord.gg/jBwTDzJ4sk

I'm using the HA addon but don't have anywhere near that many topics.

Alphaemef commented 1 year ago

From what I can gather, its primarily hitting users with a lot of topics (I guess 5550 topics is high). Its so rarely reported though, I don't even know where to look. Will check out the discord.

euggersh commented 1 year ago

If you find out anything interesting on Discord, a post here would be very much appreciated.

Alphaemef commented 1 year ago

Tagged French. Will see if he has ny comments or guidance. Might just be time to host the MQTT broker outside HA, which I really was hoping to avoid.

clintkev251 commented 1 year ago

I don't think it's anything specific to hosting the broker in HA, it's the same container regardless of where you're hosting it. I run mine completely independently and I'm still impacted by this

BubuOT commented 1 year ago

2.0.18 doesn't fix this, no.

E: Updated the title to reflect this. Also added a hint that this might be related to having a lot of retained topics.

scottshanafelt commented 9 months ago

Just adding that I ran into this problem running 2.0.18 and downgrading to 2.0.15 resolved the issue. Basically the broker just won't retain messages. I even manually try to set a retained message using MQTTExplorer and even that fails to retain. All good on 2.0.15

Alphaemef commented 9 months ago

Just adding that I ran into this problem running 2.0.18 and downgrading to 2.0.15 resolved the issue. Basically the broker just won't retain messages. I even manually try to set a retained message using MQTTExplorer and even that fails to retain. All good on 2.0.15

Yeah its likely the same issue. I totally get that this isn't necessarily fixed immediately. But it sure would be nice if it at least could get acknowledged as an actual issue.

digiblur commented 8 months ago

I can also confirm this issue. Rolled back my container tag to 2.0.15 and all is well again with Zigbee2MQTT and HA.

MnM001 commented 8 months ago

Not worried about this in version 2.0.15?

image

stigvi commented 8 months ago

Not worried about this in version 2.0.15?

Why do you ask? What alternatives are there?

MnM001 commented 8 months ago

No idea... new 2.0.15 image with vulnerabilities fixed?

I mean if you thought that it was bad when devices were not working, I really think it will be way worst than that if someone manages to exploit these vulnerabilities. And as the mqtt owner is aware that any versions after 2.0.15 have issues maybe they might have a duty of care to build a new 2.0.15 version without vulnerabilities?

Alk3m1st commented 6 months ago

Hey all just to say I was having these issues also with my setup. Tried a lot of things but in the end it was a Home Assistant Core and Operating System update that fixed it for me. I was going to do an add-on restore but luckily didn't need to. Running on Core 2024.5.5 and OS 12.3. Thought I'd post here in case anyone can't restore or would rather not do so. Hopefully this resolved it for others too.

digiblur commented 6 months ago

This is impacting people that don't even run HAOS.

SVH-Powel commented 6 months ago

Can this be solved by increasing max_queued_messages?

WhimsySpoon commented 6 months ago

I'll be installing the updated HA Add-on with that change (https://github.com/home-assistant/addons/pull/3615) in a few hours to see.

WhimsySpoon commented 6 months ago

Update: installed HA Core: restarted

Result: So far, so good.

The previous version with the issue resulted in devices showing as unavailable, whereas everything has come online as expected.

SVH-Powel commented 6 months ago

So far, so good here, too. I think I stay on 2.0.18. Default value for max_queued_messages is 1000 and I have set it to 8192. (0 is unlimited)

I have restarted my system several times and there is no problem with retained messages.

digiblur commented 6 months ago

Added the max_queued_messages 8192 to my conf file, upgrade to the latest stable, restarted HA and the issue is still there. Rolled back to 2.0.15 and the issue goes away.