Koenkk / Z-Stack-firmware

Compilation instructions and hex files for Z-Stack firmwares
MIT License
2.34k stars 644 forks source link

Stability question regarding 20190425 #86

Closed nob0dy80 closed 5 years ago

nob0dy80 commented 5 years ago

@Koenkk using the max_stable firmware which is working great for weeks now. All other firmwares i tried, lead to a crash on a large network.

I see in 20190425 you increased the direct devices again. i'm afraid to give it a try cause i already see it crashing again.

In the changelog i see

are these settings only good for performance or are they any related to stability too?

Koenkk commented 5 years ago

Source routing is disabled to fix a bug (https://github.com/Koenkk/zigbee2mqtt/issues/1408) This should allow for a decreased XDATA -> more place for direct devices.

The best way to find it out is just to try it (I'm interested in your feedback), you can always revert.

ccorderod commented 5 years ago

@Koenkk Hi. I do have a pretty big ZB network (60+ devices). I was running 1.3.1 and max_stable firmware. Upgraded to firmware 20190425 (not z2m; I am still on 1.3.1):

I am going to invest some more time tomorrow on this (in case you want me to test something), then I will revert to previous firmware as my HA setup is quite useless as of now.

thank you.

Koenkk commented 5 years ago

What did you see in the log when you lost your network and trying to control devices?

ccorderod commented 5 years ago

Errors from the coordinator stating there were no devices in the network (no routes)

Koenkk commented 5 years ago

Did you follow the troubleshooting from http://www.zigbee2mqtt.io/information/what_does_and_doesnt_require_repairing.html?

ccorderod commented 5 years ago

Oh yes. I did went through all your doc @Koenkk :-). I did not want to risk to loose my network after upgrading through the firmware update. As I said, my setup is a bit (too?) complex. The ZB network is huge (and growing) plus 140 light bulbs plus a few hue motion sensor and dimmers (philips hue and Ikea Tradfri) managed by two hue bridges. Z2M is supported by a cc2531 coordinator on band 11, and hue bridges are on band 20 and 25.

Do you need me to try anything on the Z2M network before I move back to previous firmware? Txs. PD: i am suffering a bit with z2m, but you have done an impressive piece of work @Koenkk, and I thank you for that! And your support.

Koenkk commented 5 years ago

@ccorderod what is you status now? Did everything work again when reverting to max stability firmware?

ccorderod commented 5 years ago

@Koenkk yes, everything back to normal with max stability firmware. Still doing some tests with another cc2531 plug with latest 1.2 coordinator firmware in testbed setup.

tim-devel commented 5 years ago

Hi, my network had fallen over too with the new firmware, getting lots of 205 errors. Where can I find a copy of the max stability firmware?

chris-jennings commented 5 years ago

Hi, my network had fallen over too with the new firmware, getting lots of 205 errors. Where can I find a copy of the max stability firmware?

Try here: CC2531_MAX_STABILITY_20190315.zip

bertran1 commented 5 years ago

Hi, I have also this issue with a large network.

No problem with CC2531 - 15-03 MAX Stability. I have tried firmware 24-04 and 23-05, but the same problems. After a while some devices are not more available.

5/31/2019, 7:10:03 AM - error: Failed to reenable joining 5/31/2019, 7:10:44 AM - error: Failed to ping 'Mi power plug Server' 5/31/2019, 7:10:44 AM - error: Failed to ping 'Mi power plug TV Slaapkamer' 5/31/2019, 7:10:44 AM - error: Failed to ping 'Mi power plug TV Woonkamer' 5/31/2019, 7:11:44 AM - error: Failed to ping 'Mi power plug Server' 5/31/2019, 7:11:44 AM - error: Failed to ping 'Mi power plug TV Slaapkamer' 5/31/2019, 7:11:44 AM - error: Failed to ping 'Mi power plug TV Woonkamer' 5/31/2019, 7:12:37 AM - info: Switching log level to 'debug' 5/31/2019, 7:12:37 AM - info: MQTT publish: topic 'zigbee2mqtt/bridge/config', payload '{"version":"1.4.0","commit":"unknown","coordinator":20190523,"log_level":"debug","permit_join":true}' 5/31/2019, 7:12:37 AM - debug: Received MQTT message on 'zigbee2mqtt/bridge/config/log_level' with data 'debug' 5/31/2019, 7:12:37 AM - info: Switching log level to 'debug' 5/31/2019, 7:12:37 AM - info: MQTT publish: topic 'zigbee2mqtt/bridge/config', payload '{"version":"1.4.0","commit":"unknown","coordinator":20190523,"log_level":"debug","permit_join":true}' 5/31/2019, 7:12:42 AM - debug: Saving state to file /opt/zigbee2mqtt/data/state.json 5/31/2019, 7:12:43 AM - error: Failed to reenable joining 5/31/2019, 7:12:43 AM - debug: Ping 0x00158d0000f9XXXX (basic) 5/31/2019, 7:12:44 AM - debug: Ping 0x00158d00026XXXX (basic) 5/31/2019, 7:12:44 AM - debug: Ping 0x00158d00029XXXX (basic) 5/31/2019, 7:12:44 AM - error: Failed to ping 'Mi power plug Server' 5/31/2019, 7:12:44 AM - error: Failed to ping 'Mi power plug TV Slaapkamer' 5/31/2019, 7:12:44 AM - error: Failed to ping 'Mi power plug TV Woonkamer' 5/31/2019, 7:13:43 AM - debug: Ping 0x00158d0000f9XXXX (basic)

Koenkk commented 5 years ago

@bertran1 can you post the log when trying to control a device? (I'm interested in the error code)

bertran1 commented 5 years ago

@Koenkk

Sure!

log:

Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:error 6/4/2019, 11:50:43 AM Zigbee publish to device 'Mi power plug TV Woonkamer', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: Timed out after 30000 ms Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:info 6/4/2019, 11:50:43 AM MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: Timed out after 30000 ms","meta":{"entity":{"ID":"0x00158d000XXXXX","type":"device","friendlyName":"Mi power plug TV Woonkamer"},"message":"ON"}}' Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:info 6/4/2019, 11:50:43 AM Zigbee publish to device 'Mi power plug TV Woonkamer', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:info 6/4/2019, 11:50:43 AM MQTT publish: topic 'zigbee2mqtt/Mi power plug TV Woonkamer', payload '{"state":"ON","power":59.36,"voltage":234.6,"consumption":58.74,"temperature":29,"linkquality":233}' Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:debug 6/4/2019, 11:50:43 AM Received zigbee message of type 'attReport' with data '{"cid":"genOnOff","data":{"61440":117440719,"onOff":1}}' of device 'lumi.plug' (0x00158d000XXXXX) of endpoint 1 Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:info 6/4/2019, 11:50:43 AM MQTT publish: topic 'zigbee2mqtt/Mi power plug TV Woonkamer', payload '{"state":"ON","power":59.36,"voltage":234.6,"consumption":58.74,"temperature":29,"linkquality":233}' Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:debug 6/4/2019, 11:50:43 AM Received zigbee message of type 'devChange' with data '{"cid":"genOnOff","data":{"61440":117440719}}' of device 'lumi.plug' (0x00158d000XXXXX) of endpoint 1 Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:debug 6/4/2019, 11:50:43 AM Received zigbee message of type 'readRsp' with data '{"cid":"genBasic","data":{"zclVersion":1}}' of device 'lumi.plug' (0x00158d000XXXXX) of endpoint 1 Jun 04 11:50:43 homeassistant npm[655]: zigbee2mqtt:debug 6/4/2019, 11:50:43 AM Successfully pinged 'Mi power plug TV Woonkamer' Jun 04 11:50:48 homeassistant npm[655]: zigbee2mqtt:debug 6/4/2019, 11:50:48 AM Received zigbee message of type 'attReport' with data '{"cid":"genAnalogInput","data":{"presentValue":100.06400299072266}}' of device 'lumi.plug' (0x00158d000XXXXX) of endpoint 2

Koenkk commented 5 years ago

@bertran1 does it keep working with: https://github.com/Koenkk/Z-Stack-firmware/blob/dev/coordinator/Z-Stack_Home_1.2/bin/CC2531_20190523.zip ?

bertran1 commented 5 years ago

@Koenkk No, I already tried that firmware version.

Jun 03 21:42:41 homeassistant npm[4911]: zigbee2mqtt:info 6/3/2019, 9:42:41 PM Coordinator firmware version: '20190523'

Koenkk commented 5 years ago

@bertran1 how large is your network?

bertran1 commented 5 years ago

@Koenkk 6/4/2019, 12:25:20 PM - info: Currently 29 devices are joined

I never here any problems with firmware 23-05 on smaller networks.

Koenkk commented 5 years ago

@bertran1 can you try with the 20190608 firmware? https://github.com/Koenkk/Z-Stack-firmware/blob/dev/coordinator/Z-Stack_Home_1.2/bin/CC2531_20190608.zip

ccorderod commented 5 years ago

@Koenkk I gave it a go as my network is big (50+) and even max stability is not able to keep the mesh stable more than a few minutes (please note many of the devices are routers on my network). 20190608 firmware can't even build the mesh :(.

Koenkk commented 5 years ago

@ccorderod how do you identify that a 'mesh' is build? Note that the networkmap is not a reliable way to do this (however this should be improved in the dev branch).

bertran1 commented 5 years ago

@Koenkk Thanks for your change, but suddenly the same problems. I can send you the complete log. Some devices are not working.

Some log:

Jun 08 19:08:49 homeassistant npm[649]: zigbee2mqtt:debug 6/8/2019, 7:08:49 PM Received MQTT message on 'zigbee2mqtt/Hue Ambiance Plafondlamp Toilet/set' with data '{"state": "ON"}' Jun 08 19:08:49 homeassistant npm[649]: zigbee2mqtt:info 6/8/2019, 7:08:49 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null Jun 08 19:08:49 homeassistant npm[649]: zigbee2mqtt:info 6/8/2019, 7:08:49 PM MQTT publish: topic 'zigbee2mqtt/Hue Ambiance Plafondlamp Toilet', payload '{"state":"ON","linkquality":65,"brightness":55,"color_temp":416,"color_mode":2,"color":{"x":0.486,"y":0.415}}'

Log want to send it OFF:

Jun 08 19:10:12 homeassistant npm[649]: zigbee2mqtt:debug 6/8/2019, 7:10:12 PM Received MQTT message on 'zigbee2mqtt/Hue Ambiance Plafondlamp Toilet/set' with data '{"state": "OFF"}' Jun 08 19:10:13 homeassistant npm[649]: zigbee2mqtt:info 6/8/2019, 7:10:13 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null Jun 08 19:10:13 homeassistant npm[649]: zigbee2mqtt:error 6/8/2019, 7:10:13 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network. Jun 08 19:10:13 homeassistant npm[649]: zigbee2mqtt:info 6/8/2019, 7:10:13 PM MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.","meta":{"entity":{"ID":"0x0017880104xxxxx","type":"device","friendlyName":"Hue Ambiance Plafondlamp Toilet"},"message":"{\"state\": \"OFF\"}"}}' Jun 08 19:10:15 homeassistant npm[649]: zigbee2mqtt:debug 6/8/2019, 7:10:15 PM Received MQTT message on 'zigbee2mqtt/Hue Ambiance Plafondlamp Toilet/set' with data '{"state": "OFF"}' Jun 08 19:10:16 homeassistant npm[649]: zigbee2mqtt:info 6/8/2019, 7:10:16 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null Jun 08 19:10:16 homeassistant npm[649]: zigbee2mqtt:error 6/8/2019, 7:10:16 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network. Jun 08 19:10:16 homeassistant npm[649]: zigbee2mqtt:info 6/8/2019, 7:10:16 PM MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.","meta":{"entity":{"ID":"0x0017880104XXXXX","type":"device","friendlyName":"Hue Ambiance Plafondlamp Toilet"},"message":"{\"state\": \"OFF\"}"}}'

ccorderod commented 5 years ago

@Koenkk I have a couple of xbees (routers) on the mesh. One is attached to a computer running xctu tool. This is what i use to monitor mesh status (I confirm networkmap on master build is not reliable). I have to say even with mesh being rebuilt from time to time, I am not aware of loosing any message from meters or enddevices. I am running 20190223 coordinator firmware, as I found after trial and error it was more stable than max stability firmware rev.

Koenkk commented 5 years ago

I've now also made a source routing firmware available:

@all/@nob0dy80 can you try this firmware? https://github.com/Koenkk/Z-Stack-firmware/tree/dev/coordinator/Z-Stack_Home_1.2/bin/source_routing (CC2531_SOURCE_ROUTING_20190610.zip).

bertran1 commented 5 years ago

@Koenkk After one night I still having the same problems. None off the Zigbee devices are working.

6/11/2019, 6:48:49 AM - debug: Ping 0x00158d0000f9 (basic) 6/11/2019, 6:48:49 AM - debug: Ping 0x00158d000268 (basic) 6/11/2019, 6:48:49 AM - debug: Ping 0x00158d000291 (basic) 6/11/2019, 6:48:52 AM - error: Failed to ping 'Mi power plug Server' 6/11/2019, 6:48:55 AM - error: Failed to ping 'Mi power plug TV Slaapkamer' 6/11/2019, 6:48:58 AM - error: Failed to ping 'Mi power plug TV Woonkamer' 6/11/2019, 6:49:00 AM - debug: Received MQTT message on 'zigbee2mqtt/Mi power plug TV Slaapkamer/set' with data 'ON' 6/11/2019, 6:49:00 AM - info: Zigbee publish to device 'Mi power plug TV Slaapkamer', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null 6/11/2019, 6:49:03 AM - error: Zigbee publish to device 'Mi power plug TV Slaapkamer', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: request timeout 6/11/2019, 6:49:03 AM - info: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: request timeout","meta":{"entity":{"ID":"0x00158d000268","type":"device","friendlyName":"Mi power plug TV Slaapkamer"},"message":"ON"}}' 6/11/2019, 6:49:32 AM - error: Failed to reenable joining

Jun 11 08:05:52 homeassistant npm[1636]: zigbee2mqtt:error 6/11/2019, 8:05:52 AM Failed to ping 'Mi power plug Server' Jun 11 08:05:55 homeassistant npm[1636]: zigbee2mqtt:error 6/11/2019, 8:05:55 AM Failed to ping 'Mi power plug TV Slaapkamer' Jun 11 08:05:56 homeassistant npm[1636]: zigbee2mqtt:debug 6/11/2019, 8:05:56 AM Received MQTT message on 'zigbee2mqtt/TRADFRI control outlet Slaapkamer/set' with data 'OFF' Jun 11 08:05:58 homeassistant npm[1636]: zigbee2mqtt:error 6/11/2019, 8:05:58 AM Failed to ping 'Mi power plug TV Woonkamer' Jun 11 08:06:00 homeassistant npm[1636]: zigbee2mqtt:debug 6/11/2019, 8:06:00 AM Received MQTT message on 'zigbee2mqtt/Hue Color lamp Hal/set' with data '{"state": "ON"}' Jun 11 08:06:00 homeassistant npm[1636]: zigbee2mqtt:info 6/11/2019, 8:06:00 AM Zigbee publish to device 'Hue Color lamp Hal', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null Jun 11 08:06:01 homeassistant npm[1636]: zigbee2mqtt:debug 6/11/2019, 8:06:01 AM Received MQTT message on 'zigbee2mqtt/Hue Ambiance Plafondlamp Hal/set' with data '{"state": "ON"}' Jun 11 08:06:01 homeassistant npm[1636]: zigbee2mqtt:info 6/11/2019, 8:06:01 AM Zigbee publish to device 'Hue Ambiance Plafondlamp Hal', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null Jun 11 08:06:01 homeassistant npm[1636]: zigbee2mqtt:error 6/11/2019, 8:06:01 AM Zigbee publish to device 'TRADFRI control outlet Woonkamer', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: request timeout

Koenkk commented 5 years ago

@bertran1 is this with CC2531_SOURCE_ROUTING_20190610.zip?

bertran1 commented 5 years ago

@Koenkk yes

Koenkk commented 5 years ago

@bertran1 and with max stability firmware you have none of these problems?

bertran1 commented 5 years ago

@koenkk yes, no problems with that firmware 🙂

Koenkk commented 5 years ago

Can you try with: https://github.com/Koenkk/Z-Stack-firmware/blob/dev/coordinator/Z-Stack_Home_1.2/bin/source_routing/CC2531_SOURCE_ROUTING_20190611.zip

bertran1 commented 5 years ago

@Koenkk it is not working very well, but I think it's works better that the previous firmware. I will check tomorrow if every still working.

Log:

Jun 11 23:01:12 homeassistant npm[683]: zigbee2mqtt:error 6/11/2019, 11:01:12 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', genLevelCtrl - moveToLevelWithOnOff - {"level":55,"transtime":0} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 183. APS no ack.

ON: Jun 11 23:01:12 homeassistant npm[683]: zigbee2mqtt:info 6/11/2019, 11:01:12 PM MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 183. APS no ack.","meta":{"entity":{"ID":"0x0017880104abc4b8","type":"device","friendlyName":"Hue Ambiance Plafondlamp Toilet"},"message":"{\"state\": \"ON\", \"brightness\": 55, \"color_temp\": 416}"}}' Jun 11 23:01:13 homeassistant npm[683]: zigbee2mqtt:info 6/11/2019, 11:01:13 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', lightingColorCtrl - moveToColorTemp - {"colortemp":416,"transtime":0} - {"manufSpec":0,"disDefaultRsp":0} - null

No ACK: Jun 11 23:01:19 homeassistant npm[683]: zigbee2mqtt:info 6/11/2019, 11:01:19 PM Successfully reenabled joining Jun 11 23:01:31 homeassistant npm[683]: zigbee2mqtt:error 6/11/2019, 11:01:31 PM Zigbee publish to device 'Hue Ambiance Plafondlamp Toilet', lightingColorCtrl - moveToColorTemp - {"colortemp":416,"transtime":0} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 183. APS no ack.

bertran1 commented 5 years ago

@Koenkk I think this firmware works better, but I am still losing some devices. I will check that I know for 100% sure if I don't have this issue with the stable version.

What is the difference between the stable and this firmware?

Works: 6/12/2019, 7:38:14 PM - info: MQTT publish: topic 'zigbee2mqtt/MiJia human body movement sensor Overloop', payload '{"occupancy":false,"linkquality":228}' 6/12/2019, 7:38:40 PM - info: Successfully reenabled joining 6/12/2019, 7:38:45 PM - debug: Received MQTT message on 'zigbee2mqtt/Hue Ambiance Plafondlamp Overloop/set' with data '{"state": "OFF"}' 6/12/2019, 7:38:45 PM - info: Zigbee publish to device 'Hue Ambiance Plafondlamp Overloop', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null 6/12/2019, 7:38:45 PM - info: MQTT publish: topic 'zigbee2mqtt/Hue Ambiance Plafondlamp Overloop', payload '{"state":"OFF","linkquality":231,"brightness":15,"color_temp":444,"color_mode":2,"color":{"x":0.33,"y":0.343}}'

Issue: 6/12/2019, 7:47:57 PM - debug: Received MQTT message on 'zigbee2mqtt/Hue Ambiance Plafondlamp Overloop/set' with data '{"state": "OFF"}' 6/12/2019, 7:47:57 PM - info: Zigbee publish to device 'Hue Ambiance Plafondlamp Overloop', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null 6/12/2019, 7:47:57 PM - error: Zigbee publish to device 'Hue Ambiance Plafondlamp Overloop', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network. 6/12/2019, 7:47:57 PM - info: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.","meta":{"entity":{"ID":"0x001788010433****","type":"device","friendlyName":"Hue Ambiance Plafondlamp Overloop"},"message":"{\"state\": \"OFF\"}"}}'

Koenkk commented 5 years ago

@bertran1 this firmware has source routing enabled which should reduce 205 errors.

ccorderod commented 5 years ago

@Koenkk @bertran1 I have been running sourcerouting_20190610 for a couple of days. Mesh was stable. I lost a few msgs during this timeframe from aqara door sensors, but I am not sure z2m/firmware have something to do with it (i will have to check during the we if they are too far from a router). Debug was not enabled so I cant tell much more. Yesterday night I updated the coordinator to 20190611. This morning i found the full network down. I notice the mesh was trying to be rebuilt, but all the routers were logging 205 errors (log was not in debug mode, only errors were logged). I went back to 20190610 a few minutes ago. My two cents. Have a great day.

BabyDino commented 5 years ago

I've been pulling my hair out until I stumbled on this issue. I thought I was having hardware failures. My issues started when I added new devices to my network but at the same time I added an rfxcom as well to a Raspberry Pi. I was thinking about some USB related (interference) issues, and was focusing on the Pi instead of the firmware. I was running 20190223 at that time I believe.

Anyway, I flashed CC2531_SOURCE_ROUTING_20190611 last night but I just found out my network was down again. It lasted for about 12 hours. I don't get the 205 error anymore but a more general error:

6/13/2019, 12:22:55 PM - error: Zigbee publish to device '0x00158d0001cc61a5', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: request timeout
6/13/2019, 12:22:55 PM - info: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: request timeout","meta":{"entity":{"ID":"0x00158d0001cc61a5","type":"device","friendlyName":"0x00158d0001cc61a5"},"message":"{\"state\": \"ON\"}"}}'

I've now switched to the max stability firmware. For now my goal is to run for at least 48 hours without issues until I will resume testing on new builds with debug enabled. I will order a second coordinator as a backup at the same time.

Koenkk commented 5 years ago

@BabyDino please let us know your results after 48 hours.

BabyDino commented 5 years ago

Currently running CC2531_MAX_STABILITY_20190315. Part of the network is now failing. My network has grown to 33 devices.

6/13/2019, 6:51:25 PM - info: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.","meta":{"entity":{"ID":"0x00158d0002d4c1ac","type":"device","friendlyName":"0x00158d0002d4c1ac"},"message":"{\"state\": \"ON\"}"}}'
6/13/2019, 6:51:25 PM - info: Zigbee publish to device '0x00158d0002d4c1ac', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null
6/13/2019, 6:51:31 PM - error: Zigbee publish to device '0x00158d0002d4c1ac', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.

As soon as I restart the network heals quickly and everything works perfectly. But out of nowhere it looks like the coordinator just stops.

Koenkk commented 5 years ago

@BabyDino it completely stops? Because that error should fix itself.

BabyDino commented 5 years ago

@Koenkk on one occasion I physically had to remove the device because zigbee-shepherd couldn't connect to it anymore. I'm not sure if this was an isolated incident.

Anyway in my case, if the network fails, nothing works anymore and it doesn't come back with one exception: with the max stability version some devices were still working, but most of the routers (in my case Innr bulbs) didn't respond anymore (205 error).

My coordinator is close to 3 bulbs, so connectivity is not an issue (and it never was). It seems that adding more and more devices destabilizes the network.

I'm currently running sourcerouting_20190610 as well since this afternoon. I really hope it will survive the night (my girlfriend starts complaining about my little project, she doesn't understand technology ;)). I will report back tomorrow.

As said, to rule out hardware errors I ordered a couple of extra CS2531's.

bertran1 commented 5 years ago

@Koenkk After more than 48 hours I can confirm that I don't have any issues with the stable version.

Koenkk commented 5 years ago

@bertran1 which firmware is this? CC2531_MAX_STABILITY_20190315?

bertran1 commented 5 years ago

@Koenkk Yes! No problems with that firmware.

6/12/2019, 8:34:36 PM - info: Coordinator firmware version: '20190315'

Koenkk commented 5 years ago

Great, than this will probably be the recommended firmware (https://github.com/Koenkk/zigbee2mqtt/issues/1536#issuecomment-502373661)

BabyDino commented 5 years ago

Small update: I've been running sourcerouting_20190610 since my last post without issues so far. The CC2531_MAX_STABILITY_20190315 did not work for me with 33 devices.

Koenkk commented 5 years ago

@BabyDino What issues are you having with CC2531_MAX_STABILITY_2019031?

BabyDino commented 5 years ago

@Koenkk please refer to my previous post: https://github.com/Koenkk/Z-Stack-firmware/issues/86#issuecomment-501654673

Basically the coordinator loses connection. I'll check if I have logs with that firmware. I'll let you know.

ccorderod commented 5 years ago

@Koenkk I have been on (https://github.com/Koenkk/Z-Stack-firmware/blob/e98e0e8c2b5826ad4d29a1046bfeaf78c08b4c53/coordinator/max_stability/CC2531/CC2531_MAX_STABILITY_20190315.zip) for the last couple of days. My mesh crash everynow and then, and it is rebuilt without me doing anything in a matter of minutes. I suppose during the rebuild phase, if some msgs came in from end devices I am gonna loose those msgs. As far as I can tell, this is the most "stable" firmware I have been able to test in the past weeks, but I won't feel confortable with it as it makes my HA platform unstable (door/windows/gas/motion sensors are all on zigbee). I did move some of my ikead bulbs from hue bridge onto the z2m network just to make sure there were enough zigbee coverage to support all those devices. Any advise? Txs.

BabyDino commented 5 years ago

@BabyDino What issues are you having with CC2531_MAX_STABILITY_2019031?

The log's dont tell me much. Out of nothing I receive this:

6/13/2019, 5:56:10 PM - error: Zigbee publish to device '0x00158d0002d4c1ac', genOnOff - on - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route.
6/13/2019, 5:56:10 PM - info: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 205. No network route. Please confirm that the device

and

6/13/2019, 6:46:57 PM - error: Zigbee publish to device '0x00158d0002a7a8d8', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 233. MAC no ack.
6/13/2019, 6:46:57 PM - info: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 233. MAC no ack.","meta":{"entity":{"ID":"0x00158d000
6/13/2019, 6:46:57 PM - info: Zigbee publish to device '0x00158d0001cbbfed', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null

No issues with sourcerouting_20190610 so far though!

Koenkk commented 5 years ago

@BabyDino based on your network size I cannot explain why this happens with CC2531_MAX_STABILITY_20190315 and not with sourcerouting_20190610. Both firmwares have big enough routing tables to cover your whole network (40 vs 50).

BabyDino commented 5 years ago

@Koenkk

  • Is this something you occasionally see?

I only had one run with the 0315, when it became unstable I immediately flashed 0610.

  • If it happens, did you successfully control that device in the same zigbee2mqtt session? (new session = zigbee2mqtt reset).

Can't say, see above.

  • Have you ran sourcerouting_20190610 long enough to be sure it doesn't happen with that firmware?

It has been running for 4 days without issues now. But I haven't restarted yet. The network is stable at this time.

If you want me to debug some stuff let me know. What I can say is that 0610 is the best one so far for me.

Koenkk commented 5 years ago

Can you do another try with 0315?