Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.2k stars 1.68k forks source link

[Feature request]: Incorporate multi-threading #20648

Closed corporategoth closed 5 months ago

corporategoth commented 10 months ago

Is your feature request related to a problem? Please describe

High MQTT load (for example, when you have a LOT of HA devices, and HA configuration occurs) can cause actual zigbee traffic to be delayed/halted, and even time out (causing Z2M to crash). This can be because of either the MQTT traffic, OR the logging of such (if logging is set to debug).

To wit, if I have logging set to debug, and HA enabled, Z2M will almost always crash on startup for me, because the amount of work being done both to log things and communicate with HA's autoconfiguration causes enough delays in Zigbee comms to cause a timeout and crash. I have to either turn off HA integration before startup, or set logging to info to get a successful startup of Z2M.

Describe the solution you'd like

Although JavaScript is normally single-threaded, this is a prime candidate for investigating/using the means to do multi-threading in JS. Either through some existing means (there are a few) or by splitting things into multiple processes.

There would have to be a pair of queues between the threads (one going each direction). So that, for example, when you get a status from the Zigbee network, you put that status onto a queue, and a separate thread is responsible for publishing that to MQTT. Or you get a command from MQTT, it goes onto a queue, that the Zigbee thread can pick up and send to the zigbee device. This way, high load or concurrency limiting (on either side) will NOT affect the other.

The same is true for logging (though in this case only 1 queue is needed). Log messages are put on a queue, and a separate thread pulls from that queue to write the logs out (to disk, or wherever) - so that any I/O from logging doesn't delay core processing.

Describe alternatives you've considered

For now, I'm working around this by lowing log levels and disabling HA integration on startup. Not ideal, especially if a restart happens when I'm not around to change these values.

Additional context

All as described.

corporategoth commented 10 months ago

This could probably achieve this: https://nodejs.org/api/worker_threads.html

Easier for the logging, but should be able to handle all needs. Simply start up 3 worker threads. One for MQTT comms, one for Zigbee comms, one for logging. And invoke each for bridging.

So for logging, you might do logger_worker.postMessage({ logData }); Or to send to zigbee you might do zigbee_driver.postMessage({ requestToSend }); Or to send something to MQTT, mqtt.postMessage({ requestToSend });

I'm not a JS coder (C++ is my language), so I would just make a hash of this, but it IS possible and supported by node.

Koenkk commented 10 months ago

This would be a good addition indeed. But to confirm this is the issue, when setting the log level to error, are your issues solved?

corporategoth commented 10 months ago

@Koenkk My issue is 'solved' when I do EITHER of: a. Set my log level to INFO (not debug). OR b. disable home assistant during startup (and enable it once Z2M is active again).

This tells me that both logging AND the home assistant MQTT traffic have the ability to use up the main loop enough to delay Zigbee device traffic (or more likely the promise handling) to hit the timeout limit, which causes a crash on startup.

This is likely because my 180 zigbee devices ends up being 12,938 entities in Home Assistant (in HA, there is one entity for each sensor reading or controllable element provided, and some of my devices have quite a bit of configurability). Which means, on HA auto-config, there is a LOT of MQTT traffic as HA auto-configures the devices (and gets the entity statuses), which ALSO generates a LOT of debug-level logging.

So ensuring that MQTT traffic, logging, and zigbee device handling happens on different worker threads, which cannot interfere with each other would make Z2M significantly more robust.

I would work on this myself, but I'm a C++ coder, and have very little JS exposure, I would make a complete hash of it.

Koenkk commented 10 months ago

@corporategoth that's a workaround indeed, it's good that I'm aware of this now, I will check if any other users are also affected by it and implemented this if I have the time for it.

mundschenk-at commented 10 months ago

I haven't had any startup crashes, but I do notice slowdowns sometimes (in motion-triggered light automations), which might be related to logging levels (I had it all on debug to be able to use logs if necessary).

Koenkk commented 10 months ago

@mundschenk-at could you try to confirm if this indeed is the cause of your issue?

mundschenk-at commented 10 months ago

Not sure how, as the slowdowns only happened occasionally. I've changed logging to error, if it doesn't happen over a week or two, it probably was the logging, but that's still not definite proof. However, the sheer number of MQTT messages in a larger network (with HA discovery) might warrant multiple threads nonetheless.

corporategoth commented 10 months ago

@Koenkk The startup crashes is usually due to a zigbee device timeout, which brings up https://github.com/Koenkk/zigbee-herdsman/issues/856 and https://github.com/Koenkk/zigbee-herdsman/issues/857 - ie. because of the main loop being overloaded (by HA traffic, or the logging thereof), and the timer being set at "request work from zigbee" time instead of "sent work to zigbee" time.

But either way, the isolation of MQTT workloads, Zigbee device workloads and logging can only help your app. It's something I'd have done in a C++ implementation a long time ago. There systems interact with each other, but are async in nature, so don't need to interlock or starve each other.

sjorge commented 10 months ago

Hmm interesting, I notice a improvement in overall responsive ness with log set to info instead of debug when my adaptive lighting automation triggers, it causes a a bunch of reporting when the group updates around 20ish lights.

Usually briefly after the system is sluggish to respond for a few seconds, but it's better with a lower log level.

It's generally not a super big issue for me personally though, as it only triggers when there are steap changes in sun above horizon %, so mostly on the short winder days (and i usually also rate limit it, which i removed this afternoon for testing)

corporategoth commented 10 months ago

Note that every MQTT publish is published at INFO (not just debug)

info  2024-01-09 15:27:09: MQTT publish: topic 'zigbee2mqtt/Foyer Lights', payload '{"action":null,"activeEnergyReports":250,"activePowerReports":25,"autoTimerOff":0,"auxSwitchUniqueScenes":null,"bindingOffToOnSyncLevel":null,"brightness":254,"brightnessLevelForDoubleTapDown":null,"brightnessLevelForDoubleTapUp":null,"buttonDelay":"100ms","defaultLed1ColorWhenOff":255,"defaultLed1ColorWhenOn":255,"defaultLed1IntensityWhenOff":101,"defaultLed1IntensityWhenOn":101,"defaultLed2ColorWhenOff":255,"defaultLed2ColorWhenOn":255,"defaultLed2IntensityWhenOff":101,"defaultLed2IntensityWhenOn":101,"defaultLed3ColorWhenOff":255,"defaultLed3ColorWhenOn":255,"defaultLed3IntensityWhenOff":101,"defaultLed3IntensityWhenOn":101,"defaultLed4ColorWhenOff":255,"defaultLed4ColorWhenOn":255,"defaultLed4IntensityWhenOff":101,"defaultLed4IntensityWhenOn":101,"defaultLed5ColorWhenOff":255,"defaultLed5ColorWhenOn":255,"defaultLed5IntensityWhenOff":101,"defaultLed5IntensityWhenOn":101,"defaultLed6ColorWhenOff":255,"defaultLed6ColorWhenOn":255,"defaultLed6IntensityWhenOff":101,"defaultLed6IntensityWhenOn":101,"defaultLed7ColorWhenOff":255,"defaultLed7ColorWhenOn":255,"defaultLed7IntensityWhenOff":101,"defaultLed7IntensityWhenOn":101,"defaultLevelLocal":255,"defaultLevelRemote":255,"deviceBindNumber":null,"dimmingSpeedDownLocal":10,"dimmingSpeedDownRemote":0,"dimmingSpeedUpLocal":25,"dimmingSpeedUpRemote":0,"doubleTapClearNotifications":"Disabled","doubleTapDownToParam56":null,"doubleTapUpForFullBrightness":"Button Press Event + Set Load to 100%","doubleTapUpToParam55":null,"energy":1.77,"firmwareUpdateInProgressIndicator":"Enabled","higherOutputInNonNeutral":null,"individual_led_effect":null,"internalTemperature":null,"invertSwitch":"No","last_seen":"2024-01-09T15:27:09-05:00","ledBarScaling":null,"ledColorWhenOff":null,"ledColorWhenOn":null,"ledIntensityWhenOff":1,"ledIntensityWhenOn":33,"led_effect":null,"linkquality":132,"loadLevelIndicatorTimeout":"Stay On","localProtection":"Disabled","maximumLevel":null,"minimumLevel":null,"onOffLedMode":"All","outputMode":null,"overheat":null,"periodicPowerAndEnergyReports":300,"power":18.4,"powerType":null,"rampRateOffToOnLocal":5,"rampRateOffToOnRemote":5,"rampRateOnToOffLocal":10,"rampRateOnToOffRemote":10,"relayClick":"Disabled (Click Sound On)","remoteProtection":null,"smartBulbMode":null,"state":"ON","stateAfterPowerRestored":0,"switchType":null,"update":{"installed_version":16908815,"latest_version":16908815,"state":"idle"},"update_available":null}'

Which means there is still a lot of logging going on, even at info.

sjorge commented 10 months ago

I wonder if moving that one from info -> debug would help.

corporategoth commented 10 months ago

@sjorge It might help for cases of lots of logging, IF people don't WANT to log the MQTT publishes.

However, for those that DO want to log it, there is no reason the zigbee traffic (or indeed mqtt traffic) should be slowed down because of a higher log level.

Koenkk commented 10 months ago

I investigated the multi-threading a bit but doesn't seem too easy to do. This will also bring herdsman debug logging and z2m logging out of sync which makes debugging harder.

I think we should first investigate some ways to reduce the logging.

@corporategoth

sjorge commented 10 months ago

Perhaps we could introduce a 'subsystem' to the logging we can tune ?

e.g. herdman, converter, homeassistent, mqtt, ...

so we could then have these on different lives, samba does it like this and it's pretty nice to keep most things at a minimal level except certain things you are interested in at the moment.

corporategoth commented 10 months ago

@Koenkk I actually don't care as much about the logging. I can always set the logging to WARN and have no real logs. It's annoying, but I don't look at the logs often anyway.

I'm actually much more concerned about a massive influx of MQTT traffic (ie. when HA does it's auto-configure, and as such does a config request for everything, and gets a LOT of data flowing back the other way for each entity) causing delays in zigbee device communications.

Even if it doesn't freeze out the zigbee device entirely (though it could), it could turn something that had no delay, into a few seconds delay on processing a response from the zigbee network (or sending a request to the zigbee network). Even with logging at WARN.

That's more concerning to me, as that can't just be turned off or throttled like logging can without consequence. Though I thought logging might be a quick win because, at least in the case of the z2m logging that is tunable from the UI (so ignoring herdsman logging that requires the DEBUG env var), the logging seems to be contained to a single file, so I thought it might be a simple case of re-implementing the logging API calls to use the threading messaging structure used by worker_threads.

But as I said, I don't know JS / node / TS well enough to do anything more than simple tweaks or hack around ineffectively ;)

Koenkk commented 10 months ago

and as such does a config request for everything, and gets a LOT of data flowing back the other way for each entity) causing delays in zigbee device communications.

But this only happens when HA starts, right?

mundschenk-at commented 10 months ago

Actually, mostly anything that changes the discovery structure (group modifications, scenes ...) will clear and re-publish the config messages (or at least large chunks of them).

corporategoth commented 10 months ago

Plus - if home assistant is restarted, and Z2M is not, it should trigger a re-configuration (it's not right now, which causes HA to not subscribe to some topics, but it used to, and is supposed to - I am not sure whether this was a Z2M, MQTT or HA change that broke that). That's a separate issue, but the point is, HA autoconfiguration doesn't just happen at Z2M startup.

mundschenk-at commented 10 months ago

So it's not a replacement for multi-threading, but I just removed a whole bunch of unnecessary discovery updates for scene changes with https://github.com/Koenkk/zigbee2mqtt/pull/20952.

corporategoth commented 10 months ago

@Koenkk Here is a sample log. As you can see, my 180 devices results in more than 38k configure messages being sent to Home Assistant. This is why I am worried about the MQTT traffic blocking zigbee traffic. Logging I can turn down, but those 38k+ MQTT messages are still being published and choking out the single JS event loop thread.

This is a successful startup (I managed to get it to start up, with debug logging, without crashing). This log covers less than 4 minutes of operation, and it's already almost 35mb uncompressed.

log.txt.gz

Koenkk commented 10 months ago

I see, you have quite some Inovelli devices which expose a lot of entities. I guess your setup is quite extreme, nevertheless this should be fixed in z2m.

corporategoth commented 10 months ago

@Koenkk The extreme cases are where most of the action is at 🤣

Thanks - one main problem is the fact that, even though there are thousands of entities (though only 180 devices), ultimately, Home Assistant just subscribes (multiple times) to the same single topic - one for each entity, just pulling out the relevant field from that one topic.

It seems extremely wasteful for home assistant to have to subscribe multiple times to the same topic, or do a configuration back and forth on a per-entity basis, when realistically it should only have to do 1 back and forth per device if each entity's status is coming on a single topic (which it will subscribe to, if it gets updates for ANY entity on that device).

So if there is a way to cut down that config traffic, so that the auto-configure only happens on a per-device basis (not per entity), especially if HA knows that each entity on the device gets it traffic from the same topic update (which it kind of already does), that would go a long way.

I worked with HA to optimize performance issues on their side around this - where they ended up debouncing subscriptions to the device topic, and not performing duplicate subscribes (which made MQTT stuff actually work again, and not take 10 mins to start up!). But that was more about the subscriptions that happen as a result of auto-configuraiton - not the auto-configuration part. (https://github.com/home-assistant/core/pull/88117, https://github.com/home-assistant/core/pull/88826, https://github.com/home-assistant/core/pull/88862, https://github.com/home-assistant/core/pull/92172, https://github.com/home-assistant/core/pull/92201)

I'm not sure who is 'driving' the auto-configuration, or exactly how it works - so if there is a way to send less '/config' messages (ie. don't send 1 per entity, but 1 per device) - and it's understood that "This is the cconfiguration for ALL these entities, no need to configure each one individually', that would improve things drastically.

But I'm not sure if that then requires changes on BOTH the HA MQTT and Z2M side. Or violates some other mechanism in place.

Thanks though, for seeing it's a problem and recognizing it needs some kind of solution.

mundschenk-at commented 10 months ago

Unfortunately, that would mostly be a change on the HA side, completely revamping their auto-discovery protocol. I don't think it is realistic to change that so drastically at this point (because it touches lots of third-party solutions). But even if we designed an optional v2 of the discovery protocol, it would need to get architectural buy-in from HA first (historically, HA uses entities as its basic building blocks, the device registry is a later add-on). So at best, this is a long-term goal.

corporategoth commented 10 months ago

@mundschenk-at Which leads back to the current task of fixing Z2M so that it can handle most any load (on the MQTT side) without affecting Zigbee performance - similar to what HA did (fixing to be able to handle much greater MQTT load without stalling out HA). Which is what the original ticket is about anyway.

Still, something to put on the roadmap - as all this MQTT traffic seems superfluous in the first place :)

corporategoth commented 10 months ago

Slightly off topic, but ...

Looking at the log above ... Why does Z2M subscribe to the config topics, getting our own messages back? I was looking at the config messages, there are 12,752 that are MQTT publish (which is still a lot from 180 devices, but still). But ALSO, 25,831 messages RECEIVED on the /config endpoints. Specifically, I see Z2M publishing on the /config endpoint, then receiving that message twice. For example:

info  2024-01-22 20:07:36: MQTT publish: topic 'homeassistant/light/0x6c5cb1fffe56cbd6/light/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"command_topic":"zigbee2mqtt/Rec Room Lights/set","device":{"identifiers":["zigbee2mqtt_0x6c5cb1fffe56cbd6"],"manufacturer":"Inovelli","model":"Inovelli 2-in-1 switch + dimmer (VZM31-SN)","name":"Rec Room Lights","sw_version":"2.15","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"name":null,"object_id":"rec_room_lights","origin":{"name":"Zigbee2MQTT","sw":"1.35.1","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/Rec Room Lights","unique_id":"0x6c5cb1fffe56cbd6_light_zigbee2mqtt"}'
debug 2024-01-22 20:07:39: Received MQTT message on 'homeassistant/light/0x6c5cb1fffe56cbd6/light/config' with data '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"command_topic":"zigbee2mqtt/Rec Room Lights/set","device":{"identifiers":["zigbee2mqtt_0x6c5cb1fffe56cbd6"],"manufacturer":"Inovelli","model":"Inovelli 2-in-1 switch + dimmer (VZM31-SN)","name":"Rec Room Lights","sw_version":"2.15","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"name":null,"object_id":"rec_room_lights","origin":{"name":"Zigbee2MQTT","sw":"1.35.1","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/Rec Room Lights","unique_id":"0x6c5cb1fffe56cbd6_light_zigbee2mqtt"}'
debug 2024-01-22 20:07:41: Received MQTT message on 'homeassistant/light/0x6c5cb1fffe56cbd6/light/config' with data '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"command_topic":"zigbee2mqtt/Rec Room Lights/set","device":{"identifiers":["zigbee2mqtt_0x6c5cb1fffe56cbd6"],"manufacturer":"Inovelli","model":"Inovelli 2-in-1 switch + dimmer (VZM31-SN)","name":"Rec Room Lights","sw_version":"2.15","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"name":null,"object_id":"rec_room_lights","origin":{"name":"Zigbee2MQTT","sw":"1.35.1","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/Rec Room Lights","unique_id":"0x6c5cb1fffe56cbd6_light_zigbee2mqtt"}'

There should be no reason for Z2M to subscribe to the /config endpoints, just publish to them. We should probably have a separate ticket to NOT subscribe to messages Z2M itself publishes.

mundschenk-at commented 10 months ago

Z2M seems to subscribe to other completely unnecessary topics as well (maybe homeassistant/#? I haven't checked the code), I've seen messages from Valetudo being processed by Z2M during debugging. Restricting that to relevant topics (HA shutdown/restart I think) would certainly free up so e processing capacity.

Edit: Yeah, it's homeassistant/#: https://github.com/Koenkk/zigbee2mqtt/blob/3c9620428ea00d759bd1001322b0ead55eca2285/lib/extension/homeassistant.ts#L152

corporategoth commented 10 months ago

Z2M seems to subscribe to other completely unnecessary topics as well (maybe homeassistant/#? I haven't checked the code), I've seen messages from Valetudo being processed by Z2M during debugging. Restricting that to relevant topics (HA shutdown/restart I think) would certainly free up so e processing capacity.

Z2M needs to subscribe to the command topics too - but since Z2M is what tells HA what the command topics are for each entity (aka. device), it can choose what to subscribe to.

mundschenk-at commented 10 months ago

Sure, but the command topics are under its own prefix, not homeassistant.

Koenkk commented 10 months ago

@corporategoth

Why does Z2M subscribe to the config topics, getting our own messages back?

To delete any devices that are not in the installation anymore (it needs to know what it has discovered before)

corporategoth commented 10 months ago

@Koenkk Ugh - I can kind of understand it. But it triples the traffic (you send a message to mqtt, and get it back twice, presumably once if it existed before startup, and once again when we publish traffic). There must be a better way to do that (TBH, just letting the device go stale in HA almost seems preferable).

At the very least, you should subscribe before we publish the /configs, do the deletion check then, then unsubscribe before we publish all the current /config entries. At least then the traffic is only doubled (receive once if it existed prior, and send once), not tripled.

I am not sure if there is a 'list topics that have retained messages that match this wildcard' op code, I couldn't see one in the API spec when I looked at it briefly. But that would be so much better, since you don't care what the data is on those topics (ie. don't want/need to subscribe), just if they exist or not.

Also, the current behavior would put you in danger of deleting things created by anything else using MQTT discovery in HA, wouldn't it? If you're assuming that "I didn't try and publish this, but a /config topic exists for this, so therefore it must be a now-defunct entry" means entities provided by some other process would be deleted, unless you're checking metadata to see this was originally provided by Z2M, and specifically THIS instance of Z2M. Which it might well be doing, not sure.

Device-based discovery, AND isolated MQTT/Zigbee threads in Z2M can't come fast enough to unwind this mess ;)

Koenkk commented 10 months ago

Also, the current behavior would put you in danger of deleting things created by anything else using MQTT discovery in HA, wouldn't it?

It check the base topic of these messages so it knows if it comes from the current z2m instance.

mundschenk-at commented 10 months ago

Maybe we could at least use single-level wildcards (+) for a more targeted subscription. I think the Z2M base topic could be added as an intermediate level in the config topics, allowing for subscribing only our own topics. (However, I agree with @corporategoth that requiring the user to delete stale devices themselves would be OK as well, and may be even preferable.)

Koenkk commented 10 months ago

that requiring the user to delete stale devices themselves would be OK as well, and may be even preferable.)

Why do you think this is preferable?

mundschenk-at commented 10 months ago

Removing devices is an exceptional event, and I might have reason to handle that myself on the HA side (think entity customisations etc. - if the device is deleted, those are lost).

Also I'm not sure handling a rare exception merits the increase in MQTT traffic. Or do I miss a use case where devices are regularly added and removed?

corporategoth commented 10 months ago

Also, the current behavior would put you in danger of deleting things created by anything else using MQTT discovery in HA, wouldn't it?

It check the base topic of these messages so it knows if it comes from the current z2m instance.

Isn't the base homeassistant/? Or are you checking something like .state_topic in the body? Even if you checked .origin.name, it would only tell you it's from Z2M, not a specific instance of Z2M (assuming you connected two Z2M instances to HA, which I doubt many are doing, but I could see it for a bisected network or network that is too big for one controller).

mundschenk-at commented 10 months ago

It's done some other way for historic reasons, but nowadays you could use .device.via_device.

Koenkk commented 10 months ago

Also I'm not sure handling a rare exception merits the increase in MQTT traffic

We should definitely fix the MQTT traffic part, but imagine e.g. a case where you reset your whole z2m (remove all devices), this means you would have to remove all devices from HA 1-by-1.

The check whether the messages comes from the current instance is done here

mundschenk-at commented 10 months ago

I see the pain point, but it is still not something that happens regularly. If I am tinkering at this level, I'd probably clean out alle the retained messages myself on the broker. (But yeah, that's probably not a reasonable expectation for a regular user.)

mundschenk-at commented 10 months ago

My proposal on how to narrow the subscription:

A HA discovery topic looks like <discovery_prefix>/<component>/[<node_id>/]<object_id>/config. Currently, Z2M does not use the node_id part as per the HA documentation's

Best practice for entities with a unique_id is to set <object_id> to unique_id and omit the <node_id>.

However, the docs also contain this part:

The <node_id> level can be used by clients to only subscribe to their own (command) topics by using one wildcard topic like <discovery_prefix>/+/<node_id>/+/set.

I'm not sure what they mean by "command topics" as those would not start with the discovery_prefix, but this scheme can be used to subscribe to all our own discovery topics when node_id is set to the unique_id of the bridge device, e.g. homeassistant/light/zigbee2mqtt_bridge_0x12345679/0x97654321/config will be matched by a subscription to homeasisstant/+/zigbee2mqtt_bridge_0x12345679/+/config.

Note: Not sure how to gracefully handle the upgrade process - i.e. the change from discovery messages without the node_id to those that include the bridge.

corporategoth commented 10 months ago

@mundschenk-at note that the 'object ID' currently is the sensor name, not the zigbee object, for example one I pulled at random: homeassistant/number/0x70ac08fffe70c3f1/defaultLed6IntensityWhenOn/config

The component is 'number', the node ID (which IS always sent) is the zigbee address, the object ID is the individual entity being configured.

That said, @jbouwh works fast - https://github.com/home-assistant/core/pull/109030 (prototype device-based discovery that's backward compatible for HA).

Koenkk commented 10 months ago

I think once https://github.com/home-assistant/core/pull/109030 is merged, the multi-threading is not needed anymore. This is the only part in z2m that generates a huge amount of MQTT messages/logging.

mundschenk-at commented 10 months ago

@mundschenk-at note that the 'object ID' currently is the sensor name, not the zigbee object, for example one I pulled at random: homeassistant/number/0x70ac08fffe70c3f1/defaultLed6IntensityWhenOn/config

The component is 'number', the node ID (which IS always sent) is the zigbee address, the object ID is the individual entity being configured.

Ah, yeah I missed that because the device I looked at had something generic like light as the object_id. So even if we get far fewer config messages with the WIP HA PR, I think we still need to change the discovery topics to allow a targeted subscription (if the PR allows it, currently I think it does not).

corporategoth commented 10 months ago

@Koenkk quite possibly. Would still be nice, but once we're down to 180 (or 360) messages instead of 26k, in this case of autoconfig, you're right - it becomes more or less not required, but more a nice to have.

corporategoth commented 10 months ago

That said, should still probably change the order to:

  1. subscribe to HA topic (specifically only topics ending in /config)
  2. Check to see if there is any topic present we would not publish on (ie. deleted objects
  3. unsubscribe from HA topic
  4. publish our configs

ie. don't receive the messages we publish twice (once the cached, once that we just publilshed), and don't stay subscribed to the HA topic (we don't need to receive all the autoconfig messages for anything else doing discovery with HA).

Because while switching to using device-based config (when available) will help Z2M, if you stay subscribed to the HA topic, you can't control what other apps may publish to it (so Z2M could still get flooded out), and after you determine if you need to issue delete commands, it's just noise.

Also - it might even be an idea to do this subscribe/determine deletes thing before you even start up the zigbee interface. It would mean the zigbee interface startup is delayed a bit, but it also means that once you DO start it up, it can't timeout due to a large amount of traffic on the HA topic (if you unsub as suggested above).

mundschenk-at commented 10 months ago

That's why I proposed switching to the node_id/object_id form with node_id being the bridge, so that the subscription does not include random HA discovery topics.

corporategoth commented 10 months ago

That's why I proposed switching to the node_id/object_id form with node_id being the bridge, so that the subscription does not include random HA discovery topics.

Fair. But no need to process data we don't need at all - so staying subscribed after the delete check doesn't make much sense. Nor does being subscribed when we publish because we're just going to get back what we published, to no useful purpose.

But yeah, being able to specifically subscribe to homeassistant/device/node_id/+/config and homeassistant/+/node_id/+/+/config would avoid noise from other apps.

mundschenk-at commented 10 months ago

Yeah, a combination of the approaches would probably be best.

Koenkk commented 9 months ago

Completely agree!

mundschenk-at commented 9 months ago

Looks like the discovery schema change is not happening after all, at least not in the near future: https://github.com/home-assistant/core/pull/109030#issuecomment-1938436945

Koenkk commented 8 months ago

@corporategoth I want to make some improvements already and wan't to start with not staying subscribed and not to receive the discovery messages twice. I don't know how you run z2m, but can you comment this line and see if it makes a big difference? I can provide more specific instructions if needed.