Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.77k stars 1.64k forks source link

zigbee map - devices work but don't show as connected #1001

Closed h4nc closed 5 years ago

h4nc commented 5 years ago

My zigbee map (zigbee component) already worked perfectly. Devices always showed the connection.

I recently decided to make my network more secure and add a network key. As you know this means re-pairing all the devices. So I also updated the firmware of my coordinator and of my routers.

I'm using the edge addon right now, but 5 of my 14 devices don't show a connection in the map. But all of them work. It's not an issue of the map component, because the info about the connection isn't present at the code either.

This is a device that doesn't show it's connection: "0xID1" [style="rounded, dashed", label="{Zigbee Button|EndDevice|Xiaomi Aqara wireless switch (WXKG11LM)|online}"];

This is one that shoes the linkqualitiy (50): "0xID2" [style="rounded, dashed", label="{Zigbee Fenster SZ|EndDevice|Xiaomi Aqara door & window contact sensor (MCCGQ11LM)|online}"]; "0xID2" -> "0xID_COORD" [label="50"]

So the second line is missing. Do I have to delte or reset something in the database or state.json to make this work again.

The devices are all at the same place as before. My routers are connected to the coord.

Koenkk commented 5 years ago

The connection is probably not shown because during the network scan the device was sleeping. If you wakeup the device while scanning, is it shown?

h4nc commented 5 years ago

I will test this. The thing is I have an automation to scan the network every 15 minutes (for zigbee map). Those devices stayed disconnected since I opened this issue. I waked them up in the meantime.

matejzero commented 5 years ago

I'm seeing the same thing. Half of devices show up (routers and end devices), the other half (sensors - end devices) are hanging in the air, although they do work.

I tried waking up the device during network scan, but it still doesn't show as connected.

h4nc commented 5 years ago

I mentioned it before, but I want to point out that it worked for me before, it must be one of those things

Those are the things so changed.

matejzero commented 5 years ago

So it might not be the same thing as my case...

In my case, I still use the same firmware on the router and configuration wasn't changed. There was also no repairing of devices, but I did add a few devices in that time.

The only thing that changed was zigbee2mqtt upgrade from 1.0.1(I think this was the last working version, not 100%) to 1.1.1.

h4nc commented 5 years ago

Forgot to add this to the list, I also updated the zigbee2mqtt addon. So maybe that’s the reason?

Did it also work for you before updating?

Edit: I missed the last line in your post. It also worked for you. I think we have the same issue.

matejzero commented 5 years ago

It did work before, but I'm not sure which version I had...

When I get back home, I'll check if I can load an older docker image with older zigbee2mqtt just to check if network map works there.

clockbrain commented 5 years ago

Network map asks the coordinator and each router to report their neighbor table which already keep the lqi of each adjacent device. @Koenkk I don't think waking up end devices matters - routers can report their neighbor table including child end devices without the end device being awake.

Is anyone having this missing links issue able to do a network capture? Here is fragment of wireshark capture when I request a network map. Look for 'Link Quality Response' packets with some number of entries (Table Count) in the table.

Note that my setup is standard - v1.1.1 with firmware 20180815, channel 11 and default extended PAN. No Xiaomi routers though.

Frame 1571: 176 bytes on wire (1408 bits), 176 bytes captured (1408 bits) on interface 0
Internet Protocol Version 4, Src: 192.168.1.3, Dst: 192.168.1.3
User Datagram Protocol, Src Port: 17754, Dst Port: 17754
ZigBee Encapsulation Protocol, Channel: 11, Length: 116
IEEE 802.15.4 Data, Dst: 0x680d, Src: 0xfdc4
ZigBee Network Layer Data, Dst: 0x0000, Src: 0xfdc4
ZigBee Application Support Layer Data, Dst Endpt: 0, Src Endpt: 0
ZigBee Device Profile, Link Quality Response, Status: Success
    Sequence Number: 241
    Status: Success (0)
    Table Size: 6
    Index: 0
    Table Count: 3
    Neighbor Table
        Table Entry
        Table Entry
        Table Entry
            Extended Pan: dd:dd:dd:dd:dd:dd:dd:dd (dd:dd:dd:dd:dd:dd:dd:dd)
            Extended Address: PhilipsL_01:03:56:4b:e9 (00:17:88:01:03:56:4b:e9)
            Addr: 0x8441
            .... ..01 = Type: Router (1)
            .... 01.. = Idle Rx: True (1)
            .011 .... = Relationship: None (3)
            .... ..10 = Permit Joining: Unknown (2)
            Depth: 15
            LQI: 254
matejzero commented 5 years ago

I can try when I get home...

KaiboshOz commented 5 years ago

I've had the issue of unlinked nodes in the network map for some time now.. All items are listed, but the link between node + coordinator isn't there for maybe 30-40% of nodes. Mine started around the same time that the network map started showing friendly names and the border on the nodes became rounded (months ago) - but displayed the links before that. I did a search of issues and saw a post from @Koenkk saying that network map was a little buggy but wasn't a high priority. I figured that at least the nodes are shown, so I could live with it (my network is fairly small).. No change for me since 1.1.1 update (hassio) - still seeing the majority of them linked but some of them showing as isolated and with no link quality on the graph.

h4nc commented 5 years ago

For me it looks like the "raw" map includes the linkquality values of alle devices. Could someone prrof that? graphviz doesn't (5 of 14 not linked)

clockbrain commented 5 years ago

@h4nc how many links are you getting for the "raw" map? If most of your devices are routers you would expect to see close to n*(n-1) links. See https://en.wikipedia.org/wiki/Complete_graph bearing in mind that zigbee router links are bi-directional.

h4nc commented 5 years ago

@clockbrain I get 14 links and currently I have 14 devices.

clockbrain commented 5 years ago

@h4nc that suggests to me that the problem is with the lqi scan in zigbee-shepherd or the devices themselves rather than the graphviz code in zigbee4mqtt. Can you see anything in a debug log?

h4nc commented 5 years ago

Can't see something suspicious in the debug los either.

clockbrain commented 5 years ago

Turn on full zigbee-shepherd debugging as per https://koenkk.github.io/zigbee2mqtt/how_tos/how_to_debug.html

You should see entries like this (slightly redacted to hide serialport messages):

ESC[34m  zigbee2mqtt:debugESC[39m 2019-2-3 16:05:52 Received MQTT message on 'zigbee2mqtt/bridge/networkmap' with data 'raw'
ESC[32m  zigbee2mqtt:infoESC[39m 2019-2-3 16:05:52 Starting network scan...
2019-02-03T06:05:52.956Z zigbee-shepherd:request REQ --> ZDO:mgmtLqiReq
Sun, 03 Feb 2019 06:05:52 GMT cc-znp:SREQ --> ZDO:mgmtLqiReq, { dstaddr: 0, scanchannels: undefined, scanduration: undefined, startindex: 0 }
Sun, 03 Feb 2019 06:05:52 GMT cc-znp { sof: 254,
  len: 1,
  type: 'SRSP',
  subsys: 'ZDO',
  cmd: 'mgmtLqiReq',
  payload: { status: 0 },
  fcs: 85,
  csum: 85 }
Sun, 03 Feb 2019 06:05:52 GMT cc-znp:SRSP <-- ZDO:mgmtLqiReq, { status: 0 }
Sun, 03 Feb 2019 06:05:52 GMT cc-znp { sof: 254,
  len: 72,
  type: 'AREQ',
  subsys: 'ZDO',
  cmd: 'mgmtLqiRsp',
  payload:
   { srcaddr: 0,
     status: 0,
     neighbortableentries: 9,
     startindex: 0,
     neighborlqilistcount: 3,
     neighborlqilist: [ [Object], [Object], [Object] ] },
  fcs: 93,
  csum: 93 }
Sun, 03 Feb 2019 06:05:52 GMT cc-znp:AREQ <-- ZDO:mgmtLqiRsp, { srcaddr: 0, status: 0, neighbortableentries: 9, startindex: 0, neighborlqilistcount: 3, neighborlqilist: [ { extPandId: '0xdddddddddddddddd', extAddr: '0x00137a0000040a80', nwkAddr: 48202, deviceType: 1, rxOnWhenIdle: 1, relationship: 1, permitJoin: 2, depth: 1, lqi: 10 }, { extPandId: '0xdddddddddddddddd', extAddr: '0x00158d0002c3afb8', nwkAddr: 30413, deviceType: 2, rxOnWhenIdle: 0, relationship: 1, permitJoin: 2, depth: 1, lqi: 27 }, { extPandId: '0xdddddddddddddddd', extAddr: '0x00158d0002c3afdf', nwkAddr: 27767, deviceType: 2, rxOnWhenIdle: 0, relationship: 1, permitJoin: 2, depth: 1, lqi: 58 } ] }
2019-02-03T06:05:52.997Z zigbee-shepherd:msgHdlr IND <-- ZDO:mgmtLqiRsp

When zigbee2mqtt receives the MQTT raw request it starts a network scan which will consist of multiple lqi requests (ZDO:mgmtLqiReq) and lqi responses (ZDO:mgmtLqiRsp). Should be one of these request/response pairs for each router. Within the response - look to see that neighborlqilistcount is non-zero and neighborlqilist has some data.

h4nc commented 5 years ago

I turned on the debug mode like showed in the link. But still I didn’t see those long messages cc-znp

clockbrain commented 5 years ago

DEBUG=* npm start 2>&1 | tee debuglog.txt

h4nc commented 5 years ago

when I search for neighborlqilistcount in my dubug log (after publishing raw to the right topic) I don't even get one hit.

clockbrain commented 5 years ago

Ok, can you try increasing the timeout at this line https://github.com/Koenkk/zigbee-shepherd/blob/bff587393e59c351ba15927b2233dbe247a510b7/lib/shepherd.js#L469 to something like 5000 and see what that does.

Just edit the file in node_modules/zigbee-shepherd/lib/shepherd.js , make the change to timeout, and run.

I suspect the current 1 second timeout it too short meaning it only has time to find direct coordinator links and not those from routers. This could be especially significant if you have a busy network or are running nodered admin panel.

h4nc commented 5 years ago

Is this possible within the hassio addon?

clockbrain commented 5 years ago

Sorry, I don't know how to do that.

h4nc commented 5 years ago

@matejzero maybe you have more experience and could try the idea of @clockbrain

@koenkk what do you think about that. (Changing from 1sec to 5sec)

matejzero commented 5 years ago

I can try and have a look. It's a bit more difficult with hassio and docker container, since I cant restart the service or change the code. I will probably need to create a new hassio docker image and load that.

Not sure when I'll have time, probably not today, but I might have some time during the weekend.

Koenkk commented 5 years ago

@h4nc if it it improves things, why not?

h4nc commented 5 years ago

@Koenkk I don't know how to change this in hassio. Could you change this value in the devbrange. I than could reinstall edge to try if it works. If it doesn't you can change it back. Probably not the way it usually goes, but maybe thats ok for you.

Koenkk commented 5 years ago

@h4nc get into the docker container: https://koenkk.github.io/zigbee2mqtt/how_tos/how_to_support_new_devices_on_hassio.html

Then edit the file with vi node_modules/zigbee-shepherd/lib/shepherd.js.

h4nc commented 5 years ago

@Koenkk unfortunately I fail at the first step. I'm not able to ssh in my hassio at port 22222. I don't know how to create that keyfile, also I'm not sure where to put it. Would I have to remove the sd card and put the file there? I totally would like to try that out but I don't want to spend to much time it, generating more issues (that I have to solve, to get there where I want).

Did someone else already try the suggestion from above

Also I noticed that now only 3 devices don't show a connection (I think it was 4 or 5 before).

Koenkk commented 5 years ago

@h4nc increased timeout to 5000, should be available on hassio in an hour.

h4nc commented 5 years ago

Thanks and sry to bother you.

I’m curious, why does it take one hour to get the changes to the addon?

matejzero commented 5 years ago

Is the 5000ms timeout in edge version?

Koenkk commented 5 years ago

@h4nc the docker image needs to be regenerated

@matejzero yes

matejzero commented 5 years ago

I tried with the new container, but I still don't get all the links...

I'm looking at the network map and I see my routers are reporting as offline and that might be the reason for my sensors not having connections.

digraph G {
node[shape=record];
  "0x00124b0018ed3e4a" [style="bold", label="{0x00124b0018ed3e4a|Coordinator|No model information available|online}"];
  "0x00158d0001e83cdf" [style="rounded, dashed", label="{xiaomi_switch_1|EndDevice|Xiaomi Aqara wireless switch (WXKG11LM)|online}"];
  "0x00158d0001e83cdf" -> "0x00124b0018ed3e4a" [label="21"]
  "0x00158d0002b5d4cb" [style="rounded, dashed", label="{stairway_p1_motion|EndDevice|Xiaomi Aqara human body movement and illuminance sensor (RTCGQ11LM)|online}"];
  "0x00158d0002b6cd40" [style="rounded, dashed", label="{bathroom_multi|EndDevice|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|online}"];
  "0x00158d0002b6cd40" -> "0x00124b0018ed3e4a" [label="31"]
  "0x00158d0002b87db7" [style="rounded, dashed", label="{piaroom_multi|EndDevice|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|online}"];
  "0x00158d0002b87db7" -> "0x00124b0018ed3e4a" [label="84"]
  "0x000b57fffed9be2c" [style="rounded", label="{ikea_light|Router|IKEA TRADFRI LED bulb E27 1000 lumen, dimmable, opal white (LED1623G12)|offline}"];
  "0x000b57fffed9be2c" -> "0x00124b0018ed3e4a" [style="dashed", label="0"]
  "0x00158d0002c98978" [style="rounded, dashed", label="{bedroom_contact|EndDevice|Xiaomi Aqara door & window contact sensor (MCCGQ11LM)|online}"];
  "0x00158d0002c98978" -> "0x00124b0018ed3e4a" [label="68"]
  "0x00158d0002ca1ba1" [style="rounded, dashed", label="{bathroom_contact|EndDevice|Xiaomi Aqara door & window contact sensor (MCCGQ11LM)|online}"];
  "0x000d6ffffe7bd253" [style="rounded", label="{living_room_extender|Router|IKEA TRADFRI control outlet (E1603)|offline}"];
  "0x000d6ffffe7bd253" -> "0x00124b0018ed3e4a" [label="69"]
  "0x90fd9ffffe29f4dc" [style="rounded", label="{stairway_1m_extender|Router|IKEA TRADFRI control outlet (E1603)|offline}"];
  "0x90fd9ffffe29f4dc" -> "0x00124b0018ed3e4a" [label="46"]
  "0x00158d00023ad460" [style="rounded, dashed", label="{kitchen_water_leak|EndDevice|Xiaomi Aqara water leak sensor (SJCGQ11LM)|online}"];
  "0x00158d00023ad460" -> "0x00124b0018ed3e4a" [label="69"]
  "0x00158d00023a9182" [style="rounded, dashed", label="{bathroom_water_leak|EndDevice|Xiaomi Aqara water leak sensor (SJCGQ11LM)|online}"];
  "0x00158d0002b857ec" [style="rounded, dashed", label="{toilet_motion|EndDevice|Xiaomi Aqara human body movement and illuminance sensor (RTCGQ11LM)|online}"];
  "0x00158d0002b60c46" [style="rounded, dashed", label="{kitchen_motion|EndDevice|Xiaomi Aqara human body movement and illuminance sensor (RTCGQ11LM)|online}"];
  "0x00158d000233fba0" [style="rounded, dashed", label="{livingroom_multi|EndDevice|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|online}"];
  "0x00158d000233fba0" -> "0x00124b0018ed3e4a" [label="110"]
}

As you can see, both Ikea Tradfri power plugs are offline, although they are working (I can switch it on and off in HA). I'm not sure my problem is still relevant or should I open a new ticket?

h4nc commented 5 years ago

I also tried it in the meantime and (let it "sit" for a while, thought maybe it needs some time).

The three ones that are not shown connected are still not connected.

matejzero commented 5 years ago

@h4nc: do you only have end devices or also routers? Are routers reported as online in network map in your case?

h4nc commented 5 years ago

@matejzero I have one cc2531 as coordinator and two cc2530 routers. In total 14 devices (so 11 enddevices cuurently).

Routers show an online

matejzero commented 5 years ago

Ok, so I guess we don't have the same problems...

h4nc commented 5 years ago

Your routers show as offline?

Edit: the strange thing I noticed is that the routers show as online in my map, but the state in HomeAssistant gets offline some times. For me it seems like the state in ha isn’t reliable (because enddevices that are only reachable via a router work even if the router is shown offline)

matejzero commented 5 years ago

Yes, all my routers show as offline. I will try and add a Tradfri light bulb to see if that will stay connected.

h4nc commented 5 years ago

@Koenkk so the 5000ms did not help. Do you have any other suggestions. Also waking them up while doing the network scan did not help. Should I try to repair those devices, or does that make no sense?

Koenkk commented 5 years ago

@h4nc it is very hard to get this feature 100% working, as it depends on many things, one, which is impossible to solve, is that the routers should support such feature (by responding to the scanning requests).

Bottomline: if your Zigbee network is working properly, I wouldn't care too much about the network map.

h4nc commented 5 years ago

I see what bothers me is that it already worked perfectly. All devices showed up connected. After changing the network-key and repairing I'm facing that issues. Sure it isn't that important, but it already worked. It messes a little with my OCD ;-)

Edit: @koenkk could the change of the pan_id be the reason for that issue? As I said the devices work, so they must have the right pan_id, right?

clockbrain commented 5 years ago

@h4nc shame that timeout change didn't work. I don't fully understand that part of the code but thought it worth a try.

The only way to narrow this down is to try eliminating all differences between your environment and one that works. Best if you had a second cc2531 and could setup a second network and switch one of your routers over to it. Could even be on desktop/laptop. Otherwise you could set up a second sdcard with clean z2m only (not hassio+z2m), copy over your config, database etc and swap with your production hassio sdcard for debugging. With the same stick this should work with your existing network and would make it easier to insert debug code into z2m/zigbee-shepherd to narrow it down. This approach wouldn't be able to run in parallel with your production hassio though so depends how important uptime on your production network is to you.

Problem could be any of these or even an obscure combination of them:

h4nc commented 5 years ago

I solved this. It was pretty simple, just follow these steps if you have the same issue.

1) Look for the devices that don't show the connection 2) Enable permit_join 3) Bring every device to the coordinator und re-pair the device (so press the button 4 seconds, pressing the button 4 times, depending on the device you have) 4) A warning"Message withour device appears". 5) Wait (in the meantime put your devices to their places again) 6) Update Zigbee Map 7) Should work again

rgruebel commented 5 years ago

I have the same problem after flashing the latest dev firmware

dh-harald commented 5 years ago

I've the same problem... My experiences are follows: I've 4 routers: 2 xiaomi wall switch, ikea repeater, gledopto led. Physically the two wall switches are quiet far from the controller (that's why I bought the repeater). With tricks (moved the repeater closer, move the controller farther, (I think) I could pair the wall switch to the repeater, because the link quality is about ~70-75... The network map reflect the changes (sort of... not perfect). Then I've got a message in zigbee2mqtt, that zigbee2mqtt:error 3/3/2019, 10:20:22 PM Failed to ping <any id in the network> and it looks, the full network with paired routers, is falling apart, and the controller creates a "star" topology. No routers are connected each other :( I tried to disable the availability checking with availability_timeout: 0, but it's still pinging the routers... I'm using the latest dev docker image, because of the ikea repeater (with coordinator firmware version: '20190109')

dh-harald commented 5 years ago

@Koenkk are you sure, when the controller pings a device, it can do it tru another router? Because it looks, my network is healthy (I can control the switches, link quality is quiet good), but the zigbee2mqtt always compaining, that PM Failed to ping the end devices/routers

Koenkk commented 5 years ago

@dh-harald I'm not sure, it also depends on the router I guess. However zigbee2mqtt shouldn't ping end devices, where did you see this?

Note: even with availability_timeout: 0 Xiaomi and CC2530/CC2531 routers will be pinged (in order to keep them awake).

dh-harald commented 5 years ago

@Koenkk yes, you've right, It's pinging only routers (QBKG11LM & QBKG04LM). But it's till an issue, that the controller creates a star topology (middle of everything) which is not working for me. The controller somehow thinks, the router is offline, and tries to rebuild the whole network. Meanwhile the network looks working, because I can control all "offline" devices tru mqtt/HA + the devices aren't complain for disconnected network, I've got blue lights all the time, if I turn the lights manually on. Can you add a configuration for a specified device device that disables the ping (for example QBKG04LM isn't a real router anyway)?

Koenkk commented 5 years ago

@dh-harald does it work for the QBKG11LM? In that case we can prevent the QBKG04LM from getting pinged.