Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.31k stars 1.61k forks source link

Joining new devices fails after a few days of uptime #3177

Closed sjorge closed 3 years ago

sjorge commented 4 years ago

Bug Report

What happened

I tried to join a new device, it failed.

The network/coordinator usb has been up for a about 1-1.5days now, joining new devices fail. Physically unplugging and replugging the usb (with z2m stopped ofcourse) re allows joining of new devices.

I did a capture of one of those failed join requests: https://pkg.blackdot.be/extras/zigbee/failed_join.pcapng

Not sure where it fails afterwards, it looks like z2m doesn't respond to the update device at all?

What did you expect to happen

The device to be able to join.

How to reproduce it (minimal and precise)

I can always reproduce this after a about 1-3 days after plugging in/out the USB device (rebooting the server does not help)

Debug Info

zigbee2mqtt version: 1.12.0-dev (commit #d52d520) CC253X firmware version: {"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}}

sjorge commented 4 years ago

Last time I did not have issues was around 1.6 I think.

sjorge commented 4 years ago

Soft resetting seems to work as well! However no more message after that until I restart z2m.

klorydryk commented 4 years ago

@sjorge I can't join new devices either. How can I soft reset?

sjorge commented 4 years ago

@klorydryk https://www.zigbee2mqtt.io/information/mqtt_topics_and_message_structure.html#zigbee2mqttbridgeconfigreset

Or physically removing the USB and plugging it back in. (while z2m is stopped)

After te rest via MQTT, I had to restart z2m anyway. But that might be because I am running on illumos and not linux. At least it saved a phsyical trip down to the pantry.

klorydryk commented 4 years ago

It works after a soft reset (physically ;) ) thanks.

Koenkk commented 4 years ago

I've investigated and it seems to go wrong at the Transport key part.

Log of successful join: image

Log of failed join (from OP), no transport key is done: image

One thing I noticed is that the device joins via a router of the network, not the coordinator itself (Extended source is an ember stack device): image

When it will join via the coordinator you will see that the extended source is a texas instrument device, like this: image

sjorge commented 4 years ago
{"id":2,"type":"Router","ieeeAddr":"0x000d6ffffe8e8d4f","nwkAddr":47816,"manufId":4476,"manufName":"IKEA of Sweden","powerSource":"Mains (single phase)","modelId":"TRADFRI signal repeater","epList":[1,242],"endpoints":{"1":{"profId":260,"epId":1,"devId":8,"inClusterList":[0,3,9,2821,4096,64636],"outClusterList":[25,32,4096],"clusters":{"genBasic":{"attributes":{"modelId":"TRADFRI signal repeater","manufacturerName":"IKEA of Sweden","powerSource":1,"zclVersion":3,"appVersion":33,"stackVersion":98,"hwVersion":1,"dateCode":"20190318","swBuildId":"2.2.005"}}},"binds":[{"cluster":0,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1}]},"242":{"profId":41440,"epId":242,"devId":97,"inClusterList":[33],"outClusterList":[33],"clusters":{},"binds":[]}},"appVersion":32,"stackVersion":98,"hwVersion":1,"dateCode":"20190318","swBuildId":"2.2.005","zclVersion":3,"interviewCompleted":true,"meta":{"reporting":1,"configured":2},"lastSeen":1584920505606}

It's an ikea signal repeater, also the one that occasionally stops responding to pings.

I tried again with it unplugged, same result but it used:

{"id":3,"type":"Router","ieeeAddr":"0x000d6ffffe197fe9","nwkAddr":36186,"manufId":4476,"manufName":"IKEA of Sweden","powerSource":"Mains (single phase)","modelId":"TRADFRI bulb E27 CWS opal 600lm","epList":[1],"endpoints":{"1":{"profId":49246,"epId":1,"devId":512,"inClusterList":[0,3,4,5,6,8,768,2821,4096],"outClusterList":[5,25,32,4096],"clusters":{"genBasic":{"attributes":{"modelId":"TRADFRI bulb E27 CWS opal 600lm","manufacturerName":"IKEA of Sweden","powerSource":1,"zclVersion":1,"appVersion":17,"stackVersion":87,"hwVersion":1,"dateCode":"20180410","swBuildId":"1.3.009"}},"genLevelCtrl":{"attributes":{"currentLevel":254}},"genOnOff":{"attributes":{"onOff":1}},"lightingColorCtrl":{"attributes":{"currentX":29969,"currentY":26869,"colorMode":1}}},"binds":[{"cluster":6,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":8,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":768,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1}]}},"appVersion":17,"stackVersion":87,"hwVersion":1,"dateCode":"20180410","swBuildId":"1.3.009","zclVersion":1,"interviewCompleted":true,"meta":{"reporting":1},"lastSeen":1584920594485}
sjorge commented 4 years ago

When I unscrew that one, the coordinator is out of range.

Koenkk commented 4 years ago

What if you pair the device in range of the coordinator?

sjorge commented 4 years ago

I can try that tomorrow, but I wont be able to sniff the traffic at that location. I can try on friday, I have an E27 Müller Light bulb coming in and I the sockets in that room are all E27, beets trying to mess around with the living room lamp.

sjorge commented 4 years ago

@Koenkk might be off interest, I was looking omething up in database.db and noticed this:

{"id":1,"type":"Coordinator","ieeeAddr":"0x00124b001938a7e5","nwkAddr":0,"manufId":0,"epList":[1,2,3,4,5,6,8,11,12,110],"endpoints":{"1":{"profId":260,"epId":1,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"2":{"profId":257,"epId":2,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"3":{"profId":261,"epId":3,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"4":{"profId":263,"epId":4,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"5":{"profId":264,"epId":5,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"6":{"profId":265,"epId":6,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"8":{"profId":260,"epId":8,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"11":{"profId":260,"epId":11,"devId":1024,"inClusterList":[],"outClusterList":[1280],"clusters":{},"binds":[]},"12":{"profId":49246,"epId":12,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"110":{"profId":260,"epId":110,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]}},"interviewCompleted":false,"meta":{}}
{"id":9,"type":"Coordinator","ieeeAddr":"0x00124b001938a7e5","nwkAddr":0,"manufId":0,"epList":[1,2,3,4,5,6,8,11,12,110],"endpoints":{"1":{"profId":260,"epId":1,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"2":{"profId":257,"epId":2,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"3":{"profId":261,"epId":3,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"4":{"profId":263,"epId":4,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"5":{"profId":264,"epId":5,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"6":{"profId":265,"epId":6,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"8":{"profId":260,"epId":8,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"11":{"profId":260,"epId":11,"devId":1024,"inClusterList":[],"outClusterList":[1280],"clusters":{},"binds":[]},"12":{"profId":49246,"epId":12,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"110":{"profId":260,"epId":110,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]}},"interviewCompleted":false,"meta":{},"lastSeen":null}

I have 2 entries for the coordinator?! one has interviewCompleted: false... which is odd as it doesn't get interviewed right?

Koenkk commented 4 years ago

Thats indeed interesting, is your databse.db old? (could it come from an old herdsman bug?)

sjorge commented 4 years ago

Around 1.10 I had to rebuild because I couldn't join any devices, so I wipe database.db and changed my netwerk key and pan. So It's from that time. Also I only have one C2xxx device... so not sure why it would even regonize it as a diffrent one.

sjorge commented 4 years ago

https://pkg.blackdot.be/extras/zigbee/

01_repeater_online_room

Should be same as the previous one, I captures this with everything online... just to make sure the issue was back.

02_repeater_online_coordinator

This time I moved the bulb next to the coordinator 2 rooms away

03_repeater_offline_room

This time I unplugged the IKEA Trådfri Repeater, bulb was back in the room (where I also run wireshark)

Note: wireshark might not see all traffic, as it is not in the room with the coordinator, it's in the room where the bulb should go

04_repeater_offline_coordinator

Same as 03, but next to the coordinator

Note: wireshark might not see all traffic, as it is not in the room with the coordinator, it's in the room where the bulb should go

05_all_trådfri_offline_room

This time I took offline all IKEA Trådfri routers (repeater + bulbs), the bulb I tried to pair was back in the room it should end up in.

Note: wireshark might not see all traffic, as it is not in the room with the coordinator, it's in the room where the bulb should go

06_hue_trådfri_offline_room

Same as 05, but I also took offline the hue innr bulb

Note: wireshark might not see all traffic, as it is not in the room with the coordinator, it's in the room where the bulb should go

06_hue_trådfri_offline_coordinator

Yeah I messed up the file naming, but whatever.... same as above but the bulb was near the coordinator.

Note: wireshark might not see all traffic, as it is not in the room with the coordinator, it's in the room where the bulb should go

@Koenkk a whole set of captures, pairning never succeeded. The last attempt was near the coordinator ~1cm away with all routers offline. Wireshark didn't capture much for that one as it was running on my desktop which is in the room the bulb should end up in.

sjorge commented 4 years ago
zigbee2mqtt:debug 2020-03-25 14:34:22: Received MQTT message on 'zigbee2mqtt/bridge/config/permit_join' with data 'true'
zigbee2mqtt:info  2020-03-25 14:34:22: Zigbee: allowing new devices to join.
zigbee2mqtt:info  2020-03-25 14:34:22: MQTT publish: topic 'zigbee2mqtt/bridge/config', payload '{"version":"1.12.0-dev","commit":"d52d520","coordinator":{"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}},"log_level":"debug","permit_join":true}'
zigbee2mqtt:info  2020-03-25 14:34:29: Device '0x00158d00038801e9' joined
zigbee2mqtt:info  2020-03-25 14:34:29: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"device_connected","message":{"friendly_name":"0x00158d00038801e9"}}'
zigbee2mqtt:info  2020-03-25 14:34:29: Starting interview of '0x00158d00038801e9'
zigbee2mqtt:info  2020-03-25 14:34:29: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"pairing","message":"interview_started","meta":{"friendly_name":"0x00158d00038801e9"}}'
zigbee2mqtt:debug 2020-03-25 14:34:29: Device '0x00158d00038801e9' announced itself
zigbee2mqtt:info  2020-03-25 14:34:29: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"device_announced","message":"announce","meta":{"friendly_name":"0x00158d00038801e9"}}'
zigbee2mqtt:debug 2020-03-25 14:34:40: Received Zigbee message from '0x00158d00038801e9', type 'readResponse', cluster 'genBasic', data '{"modelId":"ZBT-ExtendedColor","manufacturerName":"MLI"}' from endpoint 1 with groupID 0
zigbee2mqtt:debug 2020-03-25 14:34:40: Received Zigbee message from '0x00158d00038801e9', type 'readResponse', cluster 'genBasic', data '{"powerSource":1,"zclVersion":2}' from endpoint 1 with groupID 0
zigbee2mqtt:debug 2020-03-25 14:34:40: Received Zigbee message from '0x00158d00038801e9', type 'readResponse', cluster 'genBasic', data '{"appVersion":1,"stackVersion":1}' from endpoint 1 with groupID 0
zigbee2mqtt:debug 2020-03-25 14:34:40: Received Zigbee message from '0x00158d00038801e9', type 'readResponse', cluster 'genBasic', data '{"hwVersion":1,"dateCode":"20180404-42"}' from endpoint 1 with groupID 0
zigbee2mqtt:debug 2020-03-25 14:34:40: Received Zigbee message from '0x00158d00038801e9', type 'readResponse', cluster 'genBasic', data '{"swBuildId":"2.0"}' from endpoint 1 with groupID 0
zigbee2mqtt:info  2020-03-25 14:34:40: Successfully interviewed '0x00158d00038801e9', device has successfully been paired
zigbee2mqtt:info  2020-03-25 14:34:40: Device '0x00158d00038801e9' is supported, identified as: M_ller Licht Tint LED bulb GU10/E14/E27 350/470/806 lumen, dimmable, color, opal white (404000/404005/404012)
zigbee2mqtt:info  2020-03-25 14:34:40: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"pairing","message":"interview_successful","meta":{"friendly_name":"0x00158d00038801e9","model":"404000/404005/404012","vendor":"M_ller Licht","description":"Tint LED bulb GU10/E14/E27 350/470/806 lumen, dimmable, color, opal white","supported":true}}'

After ZNP Soft reset usingzigbee2mqtt/bridge/config/reset, the pairing works fine from in the room. See capture 08_after_reset_pair_ok if that is at all needed or not.

Koenkk commented 4 years ago

Thanks for the sniff, I don't know why the transport key is not occurring, I've asked TI for support (https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/892149)

sjorge commented 4 years ago
digraph G {
node[shape=record];
  "0x00124b001938a7e5" [style="bold, filled", fillcolor="#e04e5d", fontcolor="#ffffff", label="{Coordinator|0x00124b001938a7e5 (0)|2020-03-27T14:51:51+00:00}"];
  "0x00124b001938a7e5" -> "0x000d6ffffe8e8d4f" [penwidth=0.5, weight=0, color="#994444", label="110"]
  "0x00124b001938a7e5" -> "0x000d6ffffe197fe9" [penwidth=0.5, weight=0, color="#994444", label="14"]
  "0x00124b001938a7e5" -> "0x14b457fffe2bd760" [penwidth=0.5, weight=0, color="#994444", label="32"]
  "0x000d6ffffe8e8d4f" [style="rounded, filled", fillcolor="#4ea3e0", fontcolor="#ffffff", label="{masterbedroom/repeater|0x000d6ffffe8e8d4f (47816)|IKEA TRADFRI signal repeater (E1746)|2020-03-27T14:51:43+00:00}"];
  "0x000d6ffffe8e8d4f" -> "0x00124b001938a7e5" [penwidth=0.5, weight=0, color="#994444", label="58"]
  "0x000d6ffffe8e8d4f" -> "0x000d6ffffe197fe9" [penwidth=0.5, weight=0, color="#994444", label="58"]
  "0x000d6ffffe8e8d4f" -> "0x14b457fffe2bd760" [penwidth=0.5, weight=0, color="#994444", label="138"]
  "0x000d6ffffe197fe9" [style="rounded, filled", fillcolor="#4ea3e0", fontcolor="#ffffff", label="{bedroom/night_light/bulb|0x000d6ffffe197fe9 (1554)|IKEA TRADFRI LED bulb E14/E26/E27 600 lumen, dimmable, color, opal white (LED1624G9)|2020-03-27T14:51:11+00:00}"];
  "0x000d6ffffe197fe9" -> "0x00124b001938a7e5" [penwidth=0.5, weight=0, color="#994444", label="1"]
  "0x000d6ffffe197fe9" -> "0x000d6ffffe8e8d4f" [penwidth=0.5, weight=0, color="#994444", label="65"]
  "0x000d6ffffe197fe9" -> "0x14b457fffe2bd760" [penwidth=0.5, weight=0, color="#994444", label="129"]
  "0x14b457fffecc1315" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/night_light/remote|0x14b457fffecc1315 (44663)|IKEA TRADFRI ON/OFF switch (E1743)|2020-03-26T23:53:01+00:00}"];
  "0x14b457fffecc1315" -> "0x14b457fffe2bd760" [penwidth=1, weight=0, color="#994444", label="173"]
  "0x14b457fffeca351b" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/desk_lamp/remote|0x14b457fffeca351b (24931)|IKEA TRADFRI ON/OFF switch (E1743)|2020-03-27T06:24:54+00:00}"];
  "0x14b457fffeca351b" -> "0x14b457fffe2bd760" [penwidth=1, weight=0, color="#994444", label="171"]
  "0x00158d0004148f89" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/motion|0x00158d0004148f89 (4263)|Xiaomi Aqara human body movement and illuminance sensor (RTCGQ11LM)|2020-03-27T14:49:47+00:00}"];
  "0x00158d0004148f89" -> "0x00124b001938a7e5" [penwidth=1, weight=0, color="#994444", label="1"]
  "0x00158d00033ddfaa" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/sensor|0x00158d00033ddfaa (28407)|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|2020-03-27T14:23:58+00:00}"];
  "0x00158d00033ddfaa" -> "0x000d6ffffe8e8d4f" [penwidth=1, weight=0, color="#994444", label="211"]
  "0x00158d0003f115c5" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{serverroom/sensor|0x00158d0003f115c5 (23324)|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|2020-03-27T14:21:24+00:00}"];
  "0x00158d0003f115c5" -> "0x00124b001938a7e5" [penwidth=1, weight=0, color="#994444", label="157"]
  "0x00158d0001ffaffc" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/radiator|0x00158d0001ffaffc (49490)|Eurotronic Spirit Zigbee wireless heater thermostat (SPZB0001)|2020-03-27T14:46:36+00:00}"];
  "0x00158d0001ffaffc" -> "0x14b457fffe2bd760" [penwidth=1, weight=0, color="#994444", label="188"]
  "0x04cf8cdf3c771820" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/light_sensor|0x04cf8cdf3c771820 (24216)|Xiaomi MiJia light intensity sensor (GZCGQ01LM)|2020-03-27T14:51:38+00:00}"];
  "0x04cf8cdf3c771820" -> "0x14b457fffe2bd760" [penwidth=1, weight=0, color="#994444", label="162"]
  "0x14b457fffe2bd760" [style="rounded, filled", fillcolor="#4ea3e0", fontcolor="#ffffff", label="{bedroom/desk_lamp/bulb|0x14b457fffe2bd760 (18409)|Innr E14 bulb RGBW (RB 250 C)|2020-03-27T14:51:17+00:00}"];
  "0x14b457fffe2bd760" -> "0x00124b001938a7e5" [penwidth=0.5, weight=0, color="#994444", label="1"]
  "0x14b457fffe2bd760" -> "0x000d6ffffe8e8d4f" [penwidth=0.5, weight=0, color="#994444", label="114"]
  "0x14b457fffe2bd760" -> "0x000d6ffffe197fe9" [penwidth=0.5, weight=0, color="#994444", label="98"]
}

I assume you're going to ask for this next :)

Koenkk commented 4 years ago

jup, thanks! replied in the thread.

sjorge commented 4 years ago

@Koenkk I was reading the things the TI person posted... didn't we change something about polling frequency a while ago for the eurotronic trvs? Or am I miss remembering something.

Koenkk commented 4 years ago

No we didnt change anything related to that (some retry mechanism was added in herdsman but this for sure doesnt cause this issue as herdsman is not even aware of this).

What device are you trying to join?

sjorge commented 4 years ago

Any device will do, I have had this problem with:

Koenkk commented 4 years ago
sjorge commented 4 years ago

The pcaps from earlier were with a muller light tint E27... which I believe is a zigbee router?

Also 05_all_trådfri_offline_room would only have a hue bulb online as router at the time and a TRV as it is very annoying to remove.

Edit, I think there would have been 3 ZED and 1 ZR in that pcap active. I don't have a spare stick to make a 3 network mesh though and I can't really keep all my stuff offline for 3-6 days before the issues pops up.

Edit2: If the issue pops up again I do think I can do a ZC only with a ZR to pair... well the other device ZR would be unplugged and I can move all closeby ZED's far away.

Koenkk commented 4 years ago

Which device did you try to pair in: https://pkg.blackdot.be/extras/zigbee/failed_join.pcapng ?

image

sjorge commented 4 years ago

That was the Innr E14 I wanted to use to replace the Hue E14 because genOnOff reporting was no longer supported on the Hue after the update.

All the numbered ones were with the Hue replaced with the Innr already, where I tried to pair a new Tiny E27 (https://www.zigbee2mqtt.io/devices/404000_404005_404012.html) to look into what it supported reporting wise as mentioned in https://github.com/Koenkk/zigbee2mqtt/issues/3177#issuecomment-602861258

Koenkk commented 4 years ago

And just to be sure, this Innr is marked as a Router in database.db right?

sjorge commented 4 years ago
{
  "id": 16,
  "type": "Router",
  "ieeeAddr": "0x14b457fffe2bd760",
  "nwkAddr": 18409,
  "manufId": 4454,
  "manufName": "innr",
  "powerSource": "Mains (single phase)",
  "modelId": "RB 250 C",
  "epList": [
    1,
    242
  ],
  "endpoints": {
    "1": {
      "profId": 260,
      "epId": 1,
      "devId": 269,
      "inClusterList": [
        0,
        3,
        4,
        5,
        6,
        8,
        768,
        2821,
        4096
      ],
      "outClusterList": [
        25
      ],
      "clusters": {
        "genBasic": {
          "attributes": {
            "modelId": "RB 250 C",
            "manufacturerName": "innr",
            "powerSource": 1,
            "zclVersion": 3,
            "appVersion": 16,
            "stackVersion": 98,
            "hwVersion": 1,
            "dateCode": "20190326-87",
            "swBuildId": "2.1"
          }
        },
        "lightingColorCtrl": {
          "attributes": {
            "currentHue": 0,
            "currentSaturation": 254,
            "colorMode": 1,
            "currentX": 45914,
            "currentY": 19615,
            "colorTemperature": 153
          }
        },
        "genLevelCtrl": {
          "attributes": {
            "currentLevel": 254
          }
        },
        "genOnOff": {
          "attributes": {
            "onOff": 1
          }
        }
      },
      "binds": [
        {
          "cluster": 6,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001938a7e5",
          "endpointID": 1
        },
        {
          "cluster": 8,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001938a7e5",
          "endpointID": 1
        },
        {
          "cluster": 768,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001938a7e5",
          "endpointID": 1
        }
      ]
    },
    "242": {
      "profId": 41440,
      "epId": 242,
      "devId": 97,
      "inClusterList": [

      ],
      "outClusterList": [
        33
      ],
      "clusters": {

      },
      "binds": [

      ]
    }
  },
  "appVersion": 16,
  "stackVersion": 98,
  "hwVersion": 1,
  "dateCode": "20190326-87",
  "swBuildId": "2.1",
  "zclVersion": 3,
  "interviewCompleted": true,
  "meta": {
    "reporting": 1
  },
  "lastSeen": 1585152024465
}

Yep

Koenkk commented 4 years ago

Thanks, could you also provide a sniff with all the traffic when pairing close to the coordinator fails? (when the association response comes from the coordinator)

sjorge commented 4 years ago

I'll have to get creative, but I can try. But since I reset the stick today to do the group thing for the other issue, currently pairing is working fine. So we have to wait a few days before it 'breaks' again.

FaBjE commented 4 years ago

I experience the same issue/symptoms. I'm following this issue with great interest.

What I noticed:

Version info: {"version":"1.12.0","commit":"840b9d9", "coordinator":{"type":"zStack12","meta":"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190619}}

sjorge commented 4 years ago
* Forcing a reset of the stick using the mqtt topic either freezes everything or doesn't help

Yeah it freeze z2m for me too, but after the reset... I can just restart z2m and it starts working again including joining... so it does save me a physical trip to the server room

groenmarsmannetje commented 4 years ago

It also feels like devices are dropped from network more often then before. Maybe it is a problem for these devices to reconnect after a while. This happened to both wired as battery devices, but in my case most of them are battery devices. For instance all my smoke detectors are dropped. They normally only wake up and connect once per day. Never had issues with these devices, they were always able to connect even through other routers/routes without any issue.

sjorge commented 4 years ago

It also feels like devices are dropped from network more often then before. Maybe it is a problem for these devices to reconnect after a while. This happened to both wired as battery devices, but in my case most of them are battery devices. For instance all my smoke detectors are dropped. They normally only wake up and connect once per day. Never had issues with these devices, they were always able to connect even through other routers/routes without any issue.

Oh yeah definately... my Aqara Motion sensor drops nearly every 3 days.

sjorge commented 4 years ago

Thanks, could you also provide a sniff with all the traffic when pairing close to the coordinator fails? (when the association response comes from the coordinator)

OK I finally managed to get my laptop close enough to sniff... it took me a few tries... But I accidently bumped my usb so it was working fine because it got unplugged :s

My only possible explenation for this behavior change would be that with the ssIasZone fixes we are now sending a default reply that somehow the sensor is not happy with, which we did not send in the past. But I saw nothing of that sort while sniffing traffic. That was for a differnet issue

sjorge commented 4 years ago

I'll try to do a capture tomorrow, can't today but joining broke again! I also noticed that currently all trådfri bulbs/repeaters are no longer reporting either!

I tried plug/unplug of a bulb but it fails to setup reporting!

zigbee2mqtt:error 2020-04-08 19:23:36: Failed to setup reporting for '0x90fd9ffffee77fcf' - Error: Bind 0x90fd9ffffee77fcf/1 genOnOff from '0x00124b001938a7e5/1' failed (Error: AREQ - ZDO - bindRsp after 10000ms)
    at Endpoint.<anonymous> (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/dist/controller/model/endpoint.js:244:23)
    at Generator.throw (<anonymous>)
    at rejected (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/dist/controller/model/endpoint.js:6:65)

I wonder if it is related.

sjorge commented 4 years ago

@Koenkk Good news! I had the issue again:

extra captures probably not needed:

sjorge commented 4 years ago

This was on the dev branch from yesterday eveningish

Koenkk commented 4 years ago

Can you try with this firmware: CC2531ZNP-Prod_Source_Routing_20200410.zip

It contains the fixes mentioned in https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/894198#pi320995=2

sjorge commented 4 years ago

Does the fixes only apply to the source routing firmware? I'm using the default one currently.

I don't mind switching though, but the last time I tried I had issues when repairing my devices (after flashing I had to repair for some reason, although I thought the docs said that was not needed)

Koenkk commented 4 years ago

Reflashing shouldn't be needed indeed. But here is the default firmware: CC2531ZNP-Prod_20200410_default.zip

sjorge commented 4 years ago

I'll flash the default one later today, should give the best results to see if those changes help or not.

sjorge commented 4 years ago

Done, I flashed the default one... will keep you posted.

{
  "type": "zStack12",
  "meta": {
    "transportrev": 2,
    "product": 0,
    "majorrel": 2,
    "minorrel": 6,
    "maintrel": 3,
    "revision": 20200410
  }
}

I did figure out why I had to reflash last time! The instructions are incorrect! https://www.zigbee2mqtt.io/getting_started/flashing_the_cc2531.html as I was using windows as that was easiest.

If do as the instructions say and copy the settings form the screenshot, you uncheck: Retain IEEE address when reprogramming the chip, which seems to then require repairing of all devices. This time I left it checked (default) and it worked, all devices are still here and working aside from the Trådfri repeater... I had to unplug it and it was fine.

Also noticed a new Coordinator entry in my database.db after flashing the newer firmware!

{"id":1,"type":"Coordinator","ieeeAddr":"0x00124b001938a7e5","nwkAddr":0,"manufId":0,"epList":[1,2,3,4,5,6,8,11,12,110],"endpoints":{"1":{"profId":260,"epId":1,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"2":{"profId":257,"epId":2,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"3":{"profId":261,"epId":3,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"4":{"profId":263,"epId":4,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"5":{"profId":264,"epId":5,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"6":{"profId":265,"epId":6,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"8":{"profId":260,"epId":8,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"11":{"profId":260,"epId":11,"devId":1024,"inClusterList":[],"outClusterList":[1280],"clusters":{},"binds":[]},"12":{"profId":49246,"epId":12,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"110":{"profId":260,"epId":110,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]}},"interviewCompleted":false,"meta":{},"lastSeen":null}
{"id":9,"type":"Coordinator","ieeeAddr":"0x00124b001938a7e5","nwkAddr":0,"manufId":0,"epList":[1,2,3,4,5,6,8,11,12,13,47,110],"endpoints":{"1":{"profId":260,"epId":1,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"2":{"profId":257,"epId":2,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"3":{"profId":261,"epId":3,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"4":{"profId":263,"epId":4,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"5":{"profId":264,"epId":5,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"6":{"profId":265,"epId":6,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"8":{"profId":260,"epId":8,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"11":{"profId":260,"epId":11,"devId":1024,"inClusterList":[],"outClusterList":[1280],"clusters":{},"binds":[]},"12":{"profId":49246,"epId":12,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"13":{"profId":260,"epId":13,"devId":5,"inClusterList":[25],"outClusterList":[],"clusters":{},"binds":[]},"47":{"profId":260,"epId":47,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]},"110":{"profId":260,"epId":110,"devId":5,"inClusterList":[],"outClusterList":[],"clusters":{},"binds":[]}},"interviewCompleted":false,"meta":{},"lastSeen":null}

Two new endpoints showed up, 13 and 47?

Koenkk commented 4 years ago

New endpoints are expected, but duplicate entry not. Could you reproduce the issue and provide the herdsman debug logging of this?

sjorge commented 4 years ago

Not sure how to reproduce, aside form flashing another firmware that has different endpoints maybe?

sjorge commented 4 years ago

I could not reproduce:

I had manually cleaned up the dup entry already before.

FaBjE commented 4 years ago

You are right about the Retain IEEE, on a new stick it should be unchecked as it may not be initialized. On updating an existing stick it must be checked.

On a long stretch, maybe the duplicate entries of the coordinator address had something to do with it? I imagine something like, adding a new entry because of the new address, but later updating all the coordinator entries with the new address. But that is just a wild guess.

sjorge commented 4 years ago

Hmm yeah it might be from the last time I tried the source firmware and had retain IEEE unchecked. That was before the changes where database.db would update the coordinator's IEEE... so it could be that yeah.

sjorge commented 4 years ago

@Koenkk is it me... or does the new firmware respond slower?

Ok yeah there is some very funckyness going on. Some commands take like 1-2 sec to come through. Others are instant.

Edit: I will try to catch some debug level logging, it's really bad when using groups.

Koenkk commented 4 years ago

Not sure, I don't expect this from the diff with your previous firmware. Please check if reverting to that the old firmware solves the issue.

sjorge commented 4 years ago

I'll keep on this one until next weekend and try to joining a device again before reflashing, that should indicate if the changes fix the cannot join issues.

I was sniffing yesterday evening a bit and it seems to be slow when a lot of route discovery are done. Usually when a attReport overlaps with a msg send to a group. So the source routing firmware might also help.

Edit: old problem is back with trådfri repeater dropping from the network -_-