Closed corporategoth closed 5 months ago
@Koenkk
So I noticed with 1.36.0 that I could already keep logging at debug and have things start up. I also switched to the ember
driver and updated my device's firmware accordingly. So that's a plus.
On the upside with the line commented, if I do a grep for '/config' now in the logs, I only see 13290 lines now, instead of 37106 before I commented out the line you requested.
Additionally, all '/config' messages end about 8 seconds after the start of the app now:
debug 2024-03-06 19:35:33: Loaded state from file /data/state.json
...
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/light/1221051039810110150109113116116_33/light/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"color_mode":true,"command_topic":"zigbee2mqtt/driveway_6of6/set","device":{"identifiers":["zigbee2mqtt_1221051039810110150109113116116_33"],"manufacturer":"Zigbee2MQTT","model":"Group","name":"driveway_6of6","sw_version":"Zigbee2MQTT 1.36.0","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"max_mireds":500,"min_mireds":154,"name":null,"object_id":"driveway_6of6","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/driveway_6of6","supported_color_modes":["xy","color_temp"],"unique_id":"33_light_zigbee2mqtt"}'
As opposed to almost a full minute prior to commenting it out (with 1.36.0):
debug 2024-03-06 13:40:19: Loaded state from file /data/state.json
...
debug 2024-03-06 13:41:17: Received MQTT message on 'homeassistant/light/1221051039810110150109113116116_33/light/config' with data '{"availability":[{"topic":"zigbee2mqtt/bri
dge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"color_mode":true,"command_topic":"zigbee2mqtt/driveway_6of6/set","device":{"id
entifiers":["zigbee2mqtt_1221051039810110150109113116116_33"],"manufacturer":"Zigbee2MQTT","model":"Group","name":"driveway_6of6","sw_version":"Zigbee2MQTT 1.36.0","via_device
":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"max_mireds":500,"min_mireds":154,"name":null,"object_id":"driveway_6of6","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https:
//www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/driveway_6of6","supported_color_modes":["xy","color_temp"],"unique_id":"33_light_zigbee2mqtt"}'
On the downside, something is going wrong. For example, on Z2M startup, I picked an entity ('Post 1 Light - Small'), it published these on startup:
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/light/0xb0ce1814001d4f3c/light/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"color_mode":true,"command_topic":"zigbee2mqtt/Post 1 Light - Small/set","device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"max_mireds":500,"min_mireds":154,"name":null,"object_id":"post_1_light_-_small","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/Post 1 Light - Small","supported_color_modes":["xy","color_temp"],"unique_id":"0xb0ce1814001d4f3c_light_zigbee2mqtt"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/sensor/0xb0ce1814001d4f3c/power/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"device_class":"power","enabled_by_default":true,"entity_category":"diagnostic","object_id":"post_1_light_-_small_power","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"state_class":"measurement","state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_power_zigbee2mqtt","unit_of_measurement":"W","value_template":"{{ value_json.power }}"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/sensor/0xb0ce1814001d4f3c/energy/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"device_class":"energy","enabled_by_default":true,"object_id":"post_1_light_-_small_energy","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"state_class":"total_increasing","state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_energy_zigbee2mqtt","unit_of_measurement":"kWh","value_template":"{{ value_json.energy }}"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/sensor/0xb0ce1814001d4f3c/linkquality/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"enabled_by_default":false,"entity_category":"diagnostic","icon":"mdi:signal","name":"Linkquality","object_id":"post_1_light_-_small_linkquality","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"state_class":"measurement","state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_linkquality_zigbee2mqtt","unit_of_measurement":"lqi","value_template":"{{ value_json.linkquality }}"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/sensor/0xb0ce1814001d4f3c/last_seen/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"device_class":"timestamp","enabled_by_default":false,"entity_category":"diagnostic","icon":"mdi:clock","name":"Last seen","object_id":"post_1_light_-_small_last_seen","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_last_seen_zigbee2mqtt","value_template":"{{ value_json.last_seen }}"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/sensor/0xb0ce1814001d4f3c/update_state/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"enabled_by_default":false,"entity_category":"diagnostic","icon":"mdi:update","name":"Update state","object_id":"post_1_light_-_small_update_state","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_update_state_zigbee2mqtt","value_template":"{{ value_json['update']['state'] }}"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/binary_sensor/0xb0ce1814001d4f3c/update_available/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"device_class":"update","enabled_by_default":false,"entity_category":"diagnostic","name":null,"object_id":"post_1_light_-_small_update_available","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"payload_off":false,"payload_on":true,"state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_update_available_zigbee2mqtt","value_template":"{{ value_json['update']['state'] == \"available\" }}"}'
info 2024-03-06 19:35:41: MQTT publish: topic 'homeassistant/update/0xb0ce1814001d4f3c/update/config', payload '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"command_topic":"zigbee2mqtt/bridge/request/device/ota_update/update","device":{"identifiers":["zigbee2mqtt_0xb0ce1814001d4f3c"],"manufacturer":"Sengled","model":"Element color plus E12 (E1F-N5E)","name":"Post 1 Light - Small","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"device_class":"firmware","entity_category":"config","entity_picture":"https://github.com/Koenkk/zigbee2mqtt/raw/master/images/logo.png","json_attributes_template":"{\"in_progress\": {{ iif(value_json['update']['state'] == 'updating', 'true', 'false') }} }","json_attributes_topic":"zigbee2mqtt/Post 1 Light - Small","latest_version_template":"{{ value_json['update']['latest_version'] }}","latest_version_topic":"zigbee2mqtt/Post 1 Light - Small","name":null,"object_id":"post_1_light_-_small","origin":{"name":"Zigbee2MQTT","sw":"1.36.0","url":"https://www.zigbee2mqtt.io"},"payload_install":"{\"id\": \"0xb0ce1814001d4f3c\"}","state_topic":"zigbee2mqtt/Post 1 Light - Small","unique_id":"0xb0ce1814001d4f3c_update_zigbee2mqtt","value_template":"{{ value_json['update']['installed_version'] }}"}'
If I look at my MQTT server, and run mosquitto_sub -u test -P '****' -v -t 'homeassistant/+/0xb0ce1814001d4f3c/#'
, they are all returning correctly (both when Z2M is down, showing retained messages, and getting updated on Z2M startup).
With the line commented, On Home Assistant, the device exists, but shows no entities - as if it didn't configure at all. It knows the device exists, but it completely ignored the /config entry published.
However, if I uncomment the line again (ie. revert to default 1.36.0 behavior), I get:
One other thing (I can file a separate bug report if you like). I noticed with 1.36.0, Z2M seems to be ignoring commands from HA.
I'm using ember, so I don't know if that;'s part of it. But Z2M is RECEIVING commands from HA: Device command:
debug 2024-03-07 07:33:52: Received MQTT message on 'zigbee2mqtt/Post 1 Light - Small/set' with data '{"state":"OFF"}'
debug 2024-03-07 07:33:52: Publishing 'set' 'state' to 'Post 1 Light - Small'
Group command:
debug 2024-03-07 07:36:26: Received MQTT message on 'zigbee2mqtt/driveway_lights/set' with data '{"state":"OFF"}'
debug 2024-03-07 07:36:26: Publishing 'set' 'state' to 'driveway_lights'
Just not doing anything with it (the actual device state doesn't change). The device state does not change. Meaning any automation from HA is meaningless. It IS publishing states to HA, so I can see the state of things in HA, just not control them. This is using the stock homeassistant.js from 1.36.0
EDIT: Submitted https://github.com/Koenkk/zigbee2mqtt/issues/21707 to track this
@corporategoth I cannot understand yet why commenting that line breaks HA. It will only prevent clear messages from being published (which shouldn't happen in your case as you don't have removed devices)
@corporategoth could you confirm this problem is still present with 1.37.1? @Nerivec made some great performance improvements lately.
I increased my log level to DEBUG (I'm on 1.37.1). For startup. 1) no crash 2) doing all the config stuff takes about 50 seconds:
First /config message
[2024-05-14 15:01:37] debug: z2m: MQTT publish: topic 'homeassistant/binary_sensor/1221051039810110150109113116116_0xe0798dfffea88b0a/connection_state/config', payload '{"device":{"hw_version":"EmberZNet 7.4.1 [GA]","identifiers":["zigbee2mqtt_bridge_0xe0798dfffea88b0a"],"manufacturer":"Zigbee2MQTT","model":"Bridge","name":"Zigbee2MQTT Bridge","sw_version":"1.37.1"},"device_class":"connectivity","entity_category":"diagnostic","name":"Connection state","object_id":"zigbee2mqtt_bridge_connection_state","origin":{"name":"Zigbee2MQTT","sw":"1.37.1","url":"https://www.zigbee2mqtt.io"},"payload_off":"offline","payload_on":"online","state_topic":"zigbee2mqtt/bridge/state","unique_id":"bridge_0xe0798dfffea88b0a_connection_state_zigbee2mqtt","value_template":"{{ value_json.state }}"}'
Last /config message
[2024-05-14 15:02:25] debug: z2m: Received MQTT message on 'homeassistant/light/1221051039810110150109113116116_33/light/config' with data '{"availability":[{"topic":"zigbee2mqtt/bridge/state","value_template":"{{ value_json.state }}"}],"brightness":true,"brightness_scale":254,"command_topic":"zigbee2mqtt/driveway_6of6/set","device":{"identifiers":["zigbee2mqtt_1221051039810110150109113116116_33"],"manufacturer":"Zigbee2MQTT","model":"Group","name":"driveway_6of6","sw_version":"Zigbee2MQTT 1.37.1","via_device":"zigbee2mqtt_bridge_0xe0798dfffea88b0a"},"max_mireds":500,"min_mireds":154,"name":null,"object_id":"driveway_6of6","origin":{"name":"Zigbee2MQTT","sw":"1.37.1","url":"https://www.zigbee2mqtt.io"},"schema":"json","state_topic":"zigbee2mqtt/driveway_6of6","supported_color_modes":["xy","color_temp"],"unique_id":"33_light_zigbee2mqtt"}'
3) However ...
$ grep "/config" log.log | wc -l
42897
$ grep "/config" log.log | grep -c "MQTT publish"
14226
$ grep "/config" log.log | grep -c "Received MQTT message"
28671
I've not tried removing the line you asked for last time, but it seems it is still receiving everything twice (once presumably for the cached state in the MQTT server, and again receiving every message we publish back again, presumably - though the log seems to indicate it is sending all the publishes before any receives).
@corporategoth so the situation did improve? (the only issue is that its a bit slow now?)
For the multi-threading issue? yeah, it seems so - the only issue left is related to the entity-based discovery (ie. 12k entities) instead of device-based (180 devices), which makes discovery take a long time.
I would still recommend moving the subscribe of /config topics before the publish, processing them all (to figure out what to delete), then unsubscribing from the /config topics before publishing to them, so Z2M doesn't end up getting all the /config messages twice. But it's not causing a crash on startup anymore from what I can tell.
Because the config topics are retained, they will be replayed when you subscribe to them at start up, if they are received, it is not needed to resend them, it is enough to just catch up. Also it is recommended to subscribe to the ha mqtt birth message and use this message to trigger any state updates, in case the you send updates anyway, then do not use retained state updates, as it will result in processing 2 updates.
@corporategoth I've merged https://github.com/Koenkk/zigbee2mqtt/pull/22701 to dev now, I expect this to hugely improve the situation, can you check it on your setup? Make sure to restart z2m at least twice, in case it works, you should see a lot of this +- 5 seconds after z2m startup:
Skipping discovery of 'sensor/0x00158d0001d8e1d2/action/config', already discovered
In case it works, could you provide the same stats as you did in https://github.com/Koenkk/zigbee2mqtt/issues/20648#issuecomment-2110965020?
https://github.com/home-assistant/core/pull/109030 has finally been merged (in a addition to a slew of performance and stability updates for MQTT discovery/parsing) and is tentatively scheduled for the 2024.6.0
release of Home Assistant. I would wait until that version is released, but then I think we should try to switch to device-based discovery quickly.
@corporategoth could you check the performance with 1.38.0? It would be nice if you could provide the same info as in https://github.com/Koenkk/zigbee2mqtt/issues/20648#issuecomment-2110965020
Unfortunately, device-level discovery was reverted for 2024.6.0
. The new draft PR is https://github.com/home-assistant/core/pull/118757. There is also a corresponding documentation PR under https://github.com/home-assistant/home-assistant.io/pull/33072.
@mundschenk-at Yeah, that team there seems to have blinders on.
They don't care about the effect requiring 12,000 messages to configure 180 devices (my exact situation) has on other projects. Two of their maintainers (@bdraco and @jbouwh) seem on board with the fact that 1) this is blindingly obvious, and 2) reducing the overall number of messages to the MQTT broker makes sense.
One of them (@emontnemery) won't accept it unless it improves HA performance specifically by more than some arbitrary target number. Heck, even 12% or 16% is not enough (the other maintainers made that argument), I don't know what he wants, 100%? 500%? Running the integration gives you back CPU cycles?
I tried making the argument to them that, even if the performance was WORSE, causing a blast of 12000 messages for a mere 180 devices is not being a good software citizen, and should be fixed. Apparently they don't care, because "it works for my tiny setup" and "that's OTHER software, I only care about HA." And they decided to make it "club members only" from that point on.
So I've given up hope of device-level discovery ever being merged. They have no interest in correcting bad design (which entity-level discovery is). Sure, they will then have to maintain device AND entity level discovery from then on, but if entity level discovery is deprecated, then it will slowly die / not need updates from then on. That's how software works in professional development. You fix the bad design, and eat the cost of doing the bad design in the first place.
end rant ... sorry, I've never had good interactions with the MQTT implementers in HA. I had to repeatedly, over months, report issues and even write a fix myself (which they rejected both because I'm not a "club" member, and because they wanted a piecemeal fix instead of a holistic one) when their code was processing every retained message in an n^2 manner (ie. every single entity for a device, processed the retained messages n times, depending how many entities on a device) - which caused their startup to consume 100% cpu, take 10 minutes and time out, and FAIL for most entities in a large setup. At first they tried to blame Z2M, and I had to PROVE it was their code, AND write a fix. And they fought me every step of the way.
really end rant ;)
Unfortunately, device-level discovery was reverted for
2024.6.0
. The new draft PR is home-assistant/core#118757. There is also a corresponding documentation PR under home-assistant/home-assistant.io#33072.
Still hope we can do some tests
While I get your frustration, @corporategoth, I think this is overly harsh. HA is a really large and busy (as in the number of contributions) project, so I can understand that not everything can be merged immediately. I assume some discussions went on on Discord (or some other private venue) as well, so we only got to see the end results of those discussions in the public PR, which made them seem very sudden and not deliberative.
@jbouwh Can we help in some way to move the PR forward?
@mundschenk-at No, it's not overly harsh. The MQTT component only has 3 real contributors. @jbouwh did an excellent job of creating a device-based discovery MR. Trying to solve a specific problem (which was NOT about performance, but about flooding other systems). It was done quite quickly and it was done and complete, and just going through review/tweaking. I was even giving constructive feedback on it, when it was solely related to device-based discovery and fixing the issue at hand.
It was only later they decided to tack on performance goals to this task, which are unrelated. This is proven, because much of the performance improvements that came out have nothing to do with device-based discovery - and they got into 2024.6.0. They could have, and should have created another ticket / MR to track performance issues in the HA MQTT implementation. But they hijacked the original MR instead, and then had it sit for months because "it doesn't give enough of a performance improvement" - despite that having nothing to do with why device-based discovery is a thing.
My previous MR to fix the n^2 retained message issue was rejected because it incorporated 3 fixes in one MR (all 3 of which were required to fix this issue properly - and were eventually incorporated months later when @jbouwh got around to re-implementing my fixes). However for the device discovery MR, all of a sudden, any performance improvement was fair game to add to it, despite being completely unrelated. Delaying this MR, and the fix therein. AND, now there were minimum performance metrics that must be hit for this MR (apparently something more than 12%), despite the device discovery not being about performance, but about overwhelming other products (MQTT broker, Z2M, even HA itself can't keep up with high message volumes). All of this being the executive decision of 1 of the 3 maintainers. There is a reason I refuse to submit any fixes to the MQTT HA implementation every again.
I actually would be less upset if they had prioritized performance issues over the device-based discovery, and had let the device-based discovery sit while they were doing that. And it was still pending. But the fact they hijacked the device-based discovery MR, then locked it down to maintainers only when I pointed out that it has nothing to do with performance and continued to discuss performance issues in the device-based discovery MR is what incensed me. Especially after the previous "each MR should do only ONE small thing" debacle.
@corporategoth could you check https://github.com/Koenkk/zigbee2mqtt/issues/20648#issuecomment-2145815650 ?
@Koenkk I saw - since I use Z2M on TrueNAS, I need to wait for TrueCharts to update to 1.38.0 to test new versions in a reasonable way. As soon ad Z2M 1.38.0 is in TrueCharts, I will indeed test it :)
Weird - I see the truecharts repo is updated to have 1.38.0 (https://github.com/truecharts/charts/blob/master/charts/stable/zigbee2mqtt/Chart.yaml) - but refreshing the repo on my TrueNAS system doesn't show an upgrade available. I'll have to investigate that.
Ahh, I see. While the charts repo was updated, the catalog repo (which is used by truenas) was not - https://github.com/truecharts/catalog/blob/main/stable/zigbee2mqtt/12.1.5/Chart.yaml Ugh.
OK, I switched to using a manual docker thing now.
Results so far:
grep '/config' log.log | grep -c "Receiv"
11804
grep '/config' log.log | grep -c "publish"
2381
grep '/config' log.log | grep -c "Skipp"
11803
First message:
[2024-06-07 01:34:52] debug: z2m:mqtt: Received MQTT message on 'homeassistant/binary_sensor/1221051039810110150109113116116_0xe0798dfffea88b0a/connection_state/config' with data '{"device":{"hw_version":"EmberZNet 7.4.1 [GA]","identifiers":["zigbee2mqtt_bridge_0xe0798dfffea88b0a"],"manufacturer":"Zigbee2MQTT","model":"Bridge","name":"Zigbee2MQTT Bridge","sw_version":"1.38.0"},"device_class":"connectivity","entity_category":"diagnostic","name":"Connection state","object_id":"zigbee2mqtt_bridge_connection_state","origin":{"name":"Zigbee2MQTT","sw":"1.38.0","url":"https://www.zigbee2mqtt.io"},"payload_off":"offline","payload_on":"online","state_topic":"zigbee2mqtt/bridge/state","unique_id":"bridge_0xe0798dfffea88b0a_connection_state_zigbee2mqtt","value_template":"{{ value_json.state }}"}'
Last message:
[2024-06-07 01:34:58] debug: z2m: Skipping discovery of 'light/1221051039810110150109113116116_33/light/config', already discovered
So down to 6 seconds. Impressive :)
Not sure what's going on with the message numbers there. Does this imply that I created another 2381 NEW config topics, skipping and left the 11803 current topics alone because they were already created in the MQTT server? Ouch.
I will try a second restart soon to see what the second run brings.
Yeah, confirmed on the second run:
I now have:
grep "/config" log.log | grep -ci "Recei"
14455
grep "/config" log.log | grep -ci "publish"
1
grep "/config" log.log | grep -ci "skip"
14183
It still was finished in 5 seconds though. Not publishing saves a lot of time :D
So apparently I have 14k config topics (eep!) for my 174 devices.
Great, so it seems we have some acceptable performance now? It was not only the publishing btw, for every entity it was generating the full config of the device, this is now calculated once for each device.
With 1.38.0, I would say so. I'm sure it will be even better with device-based discovery if/when HA implements it, as that will reduce the number of config messages it receives (to check for deleted devices) too. But at 5s, it's and with the other changes, crashing on startup is a thing of the past. So thanks! :)
Great, I guess we can finally close this issue, from the Z2M side we did everything we could ๐
FWIW, https://github.com/home-assistant/core/pull/118757 was merged (again) and I have some hope of seeing it in 2024.11.0.
Is your feature request related to a problem? Please describe
High MQTT load (for example, when you have a LOT of HA devices, and HA configuration occurs) can cause actual zigbee traffic to be delayed/halted, and even time out (causing Z2M to crash). This can be because of either the MQTT traffic, OR the logging of such (if logging is set to debug).
To wit, if I have logging set to debug, and HA enabled, Z2M will almost always crash on startup for me, because the amount of work being done both to log things and communicate with HA's autoconfiguration causes enough delays in Zigbee comms to cause a timeout and crash. I have to either turn off HA integration before startup, or set logging to info to get a successful startup of Z2M.
Describe the solution you'd like
Although JavaScript is normally single-threaded, this is a prime candidate for investigating/using the means to do multi-threading in JS. Either through some existing means (there are a few) or by splitting things into multiple processes.
There would have to be a pair of queues between the threads (one going each direction). So that, for example, when you get a status from the Zigbee network, you put that status onto a queue, and a separate thread is responsible for publishing that to MQTT. Or you get a command from MQTT, it goes onto a queue, that the Zigbee thread can pick up and send to the zigbee device. This way, high load or concurrency limiting (on either side) will NOT affect the other.
The same is true for logging (though in this case only 1 queue is needed). Log messages are put on a queue, and a separate thread pulls from that queue to write the logs out (to disk, or wherever) - so that any I/O from logging doesn't delay core processing.
Describe alternatives you've considered
For now, I'm working around this by lowing log levels and disabling HA integration on startup. Not ideal, especially if a restart happens when I'm not around to change these values.
Additional context
All as described.