dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.88k stars 485 forks source link

Unreachable gateway since 2.5.42 #896

Closed ooii closed 5 years ago

ooii commented 5 years ago

Hi,

Yesterday evening, I upgraded to 2.5.42 and firmware to 26280500 under @manup recommendations to fix an issue with long term running instances. Since, then, Phosconapp as well as homebridge-hue plugin cannot reach the gateway. deconzweb app works fine, I'm seeing events in dc_eventlog, and my lights and sensors are working fine. I tried to open the gateway for a new discovery by phoscon and homebridge-hue as well as updating this latter with no success. Any idea what is going wrong? Thanks.

ebaauw commented 5 years ago

Is deCONZ still using the same port? What does ph discover report? What error does homebridge-hue report?

Opening the gateway allows new API clients to create a key - it has nothing to do with connecting to / finding the gateway.

ooii commented 5 years ago

Is deCONZ still using the same port?

Yes

What does ph discover report?

ph discover: error: SyntaxError: domain: invalid key
at newSyntaxError (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/node_modules/homebridge-lib/lib/OptionParser.js:24:48)
at OptionParser.parse (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/node_modules/homebridge-lib/lib/OptionParser.js:176:17)
at new UpnpMonitor (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/node_modules/homebridge-lib/lib/UpnpMonitor.js:51:18)
at Promise (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/lib/HueDiscovery.js:58:27)
at new Promise (<anonymous>)
at HueDiscovery._upnp (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/lib/HueDiscovery.js:57:12)
at HueDiscovery.discover (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/lib/HueDiscovery.js:34:12)
at Main.discover (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/cli/ph.js:471:40)
at Main.main (/home/myself/.npm-global/lib/node_modules/homebridge-hue-utils/cli/ph.js:357:33)
at <anonymous>

But ph get /config gives:

{
  "apiversion": "1.16.0",
  "bridgeid": "0000000000000000",
  "datastoreversion": "60",
  "factorynew": false,
  "mac": "38:ea:a7:ab:ea:e1",
  "modelid": "deCONZ",
  "name": "conBee",
  "replacesbridgeid": null,
  "starterkitid": "",
  "swversion": "2.5.42"
}

What error does homebridge-hue report?

conBee: RaspBee/ConBee not yet initialised - wait 1 minute

Note that contrarily to what I have said before, my sensors are no more working but my switches and lights still work fine.

ebaauw commented 5 years ago

"bridgeid": "0000000000000000",

Looks like deCONZ hasn't connected properly to the ConBee. It should report the ZigBee mac address of the ConBee here. Did you try to remove and re-insert it and then restart deCONZ?

ph discover: error: SyntaxError: domain: invalid key

What version of ph are you running? I think I fixed this in homebridge-hue v1.0.6.

ooii commented 5 years ago

Looks like deCONZ hasn't connected properly to the ConBee. It should report the ZigBee mac address of the ConBee here. Did you try to remove and re-insert it and then restart deCONZ?

Did that. When I restart deconz, I see a new gateway, with the right name. It asks for a password and none of the ones I have used (the default one and the one I used to enter) are accepted. If I reload the login page, this new gateway disappears and only the ones discovered before (the Hue bridge and the conBee that used to work) appear but disabled. ph discover gives {}.

What version of ph are you running? I think I fixed this in homebridge-hue v1.0.6.

I was running v1.0.5, updated to v1.0.7 now.

Is it safe to roll back to an older firmware than the one installed (26280500)? I can also downgrade deconz app to .39 that was working but with slowness after few days of running.

ebaauw commented 5 years ago

Yeah, you can downgrade the firmware using GCFFlasher_internal. You can simply download an older version of deCONZ and install that over a newer version, but that won't revert the firmware.

ooii commented 5 years ago

Downgraded both firmware and deconz and still same result: "bridgeid": "0000000000000000".

ooii commented 5 years ago

When I start deconz-gui, I see the message "connected to device". I can see all my devices but without any link between them. ph discovernow returns:

{
  "10.0.1.11": "0000000000000000"
}

Just in case, I attach the screenshot of the coordinator:

image

Edit dmesg just showed that deCONZ[5819]: segfault at 480d00000000 ip 00007fb2a4afc207 sp 00007ffeb18a0960 error 4 in libc-2.27.so[7fb2a4a65000+1e7000].

manup commented 5 years ago

segfault at 480d00000000

That doesn't look good, is this a vanilla installation or was the plugin compiled manually?

Can you please start deCONZ manually and provide the output of the log when it crashes:

$ deCONZ --dbg-info=2 --dbg-aps=1 --dbg-error=1 --http-port=80

ooii commented 5 years ago

The segfault happened only once.

is this a vanilla installation or was the plugin compiled manually?

I'm running it in Ubuntu 16.04. I download the .deb then install it with dpkg.

Can you please start deCONZ manually and provide the output of the log when it crashes: $ deCONZ --dbg-info=2 --dbg-aps=1 --dbg-error=1 --http-port=80

Attached output.txt file. Note this message 11:05:47:402 GW firmware update not supported on x86 linux headless followed few moments later by this one: 11:05:47:430 auto connect com /dev/ttyUSB1

Here is the list of FTDIdevices:

 device | vendor | product | serial  | description
--------|--------|---------|---------|----------------------
   0    | 0x0403 | 0x6001  | A1YIOU4A | RFXtrx433
   1    | 0x0403 | 0x6015  | DO00KPL5 | FT230X Basic UART

output.txt

ooii commented 5 years ago

Hi @manup, no clue about how to fix this issue?

manup commented 5 years ago

Not yet, I'll have a closer look on the related code later today.

ooii commented 5 years ago

Hi,

I tried to investigate a little bit and found some issues preventing deconz from using the gateway. Thanks to https://github.com/dresden-elektronik/deconz-rest-plugin/issues/227, I have been able to figure out that /run/lock/LCK..ttyUSB1 was owned by root. I deleted this file and started deconz with the gui to figure out that I had to upgrade the firmware of the gateway. I did it manually because I had an issue with doing it from the gui (I can explain that later if interested). Now, it seems that deconz can connect to the gateway and starts normally. However, the gateway still does not connect to other nodes as you can see in the attached image. The bridge id is still 0x000000. ph discover stil shows an empty result. image

I attach the output of /usr/bin/deCONZ --dbg-info=2 --dbg-aps=1 --dbg-error=1 --http-port=80.

output.txt

thomas70 commented 5 years ago

I dont know if it is related, but after upgrade to 2.5.42, i got 2 GW's the old is greyed out and the new is missing all my devices EDIT: Not all devices, but all lights

skjermbilde 2018-10-24 kl 19 21 21
thomas70 commented 5 years ago

After investigation i found this:

skjermbilde 2018-10-24 kl 19 53 13

Flashed the old fw and all is back to normal

ooii commented 5 years ago

Thanks @thomas70 for the feedback. Which old fw are your running? And which app version please? I tried to roll back to 2.05.39 and 0x26230500, which was working before my issues start, but I continue discovering a bridge with a null id (0x000000). I tried also 2.05.42 with different firmware versions (0x261f0500, 0x26270500, and of course 0x26280500) with same result. One question: could my gateway be damaged?

Filialen commented 5 years ago

If you have a backup of your setup in Phoscon you can try to restore it. That worked for me when my lights disapeard when I tried latest firmware. It still dont route or mesh as it should though...

thomas70 commented 5 years ago

Thanks @thomas70 for the feedback. Which old fw are your running? And which app version please? I tried to roll back to 2.05.39 and 0x26230500, which was working before my issues start, but I continue discovering a bridge with a null id (0x000000). I tried also 2.05.42 with different firmware versions (0x261f0500, 0x26270500, and of course 0x26280500) with same result. One question: could my gateway be damaged?

I had problems with deCONZ_Rpi_0x26280500.bin.GCF. Flashed back to deCONZ_Rpi_0x26270500.bin.GCF and the problems was gone

ooii commented 5 years ago

If you have a backup of your setup in Phoscon you can try to restore it.

Unfortunately, I don't have any backup. In your case, you only lost your lights. In mine, I loose usage of the gateway.

I had problems with deCONZ_Rpi_0x26280500.bin.GCF. Flashed back to deCONZ_Rpi_0x26270500.bin.GCF and the problems was gone

Tried that too...

manup commented 5 years ago

@ooii can you please check the zll.db file with sqlitebrowser if the lights are still shown but communication doesn't work it looks like network configuration has changed?

In the table zbconf, are all changed configurations listed.

ooii commented 5 years ago

Hi @manup,

can you please check the zll.db file with sqlitebrowser if the lights are still shown

I'm not sure I understand what you want. I see a devices table with 86 entries and a nodetable which seems to contain all the lights. I can't tell if the lights are still shown or not.

In the table zbconf, are all changed configurations listed

In zbconf, I have 4 entries, I list them here:

{"apsUseExtPanId":"0x0","curChannel":11,"deviceType":0,"extPanId":"0x212effff022f33","fwversion":"0x26210500","macAddress":"0x212effff022f33","networkKey":"xxxxxxxxxxxxxx","nwkAddress":"0x0","nwkUpdateId":3,"panId":"0xcbf1","securityMode":3,"staticNwkAddress":false,"swversion":"2.05.39","tcAddress":"0x212effff022f33"}
{"apsUseExtPanId":"0x0","curChannel":11,"deviceType":0,"extPanId":"0x212effff022f33","fwversion":"0x26240500","macAddress":"0x212effff022f33","networkKey":"xxxxxxxxxxxxxx","nwkAddress":"0x0","nwkUpdateId":3,"panId":"0xcbf1","securityMode":3,"staticNwkAddress":false,"swversion":"2.05.39","tcAddress":"0x212effff022f33"}
{"apsUseExtPanId":"0xa1700b70d91021e","curChannel":11,"deviceType":0,"extPanId":"0xa1700b70d91021e","fwversion":"0x26240500","macAddress":"0x212effff022f33","networkKey":"xxxxxxxxxxxxxx","nwkAddress":"0x0","nwkUpdateId":3,"panId":"0xcbf1","securityMode":3,"staticNwkAddress":false,"swversion":"2.05.39","tcAddress":"0x212effff022f33"}
{"apsUseExtPanId":"0xa1700b70075941e","curChannel":11,"deviceType":0,"extPanId":"0xa1700b70075941e","fwversion":"0x26240500","macAddress":"0x212effff022f33","networkKey":"xxxxxxxxxxxxxx","nwkAddress":"0x0","nwkUpdateId":3,"panId":"0xcbf1","securityMode":3,"staticNwkAddress":false,"swversion":"2.05.39","tcAddress":"0x212effff022f33"}

Please tell me if you need any additional information.

Thank you very much.

manup commented 5 years ago

Your configuration is not valid.

The apsUseExtPanId must be 0, your coordinator otherwise will open another network.

You can fix this in the deCONZ Settings Editor.

image

ooii commented 5 years ago

Hello @manup Unfortunaltely, that did not make it. As you can see in the attached image, my MAC Address is set to 0 and my NWK Ext PAN IDis also set to 0 while it should be the same as the TC Address, right?

image

manup commented 5 years ago

I see, please do these further steps

(also make sure network key is the same as in you zbconf table)

manup commented 5 years ago

Further please use the version wich addresses some nasty bugs

https://www.dresden-elektronik.de/rpi/deconz-firmware/deCONZ_Rpi_0x262b0500.bin.GCF

ooii commented 5 years ago

With .42? Changing the mac address seems to fix the issue. I see now links between the nodes. Going to update the firmware and the app and keep you in touch.

manup commented 5 years ago

.42 or .43 should be fine.

ooii commented 5 years ago

Thanks a lot @manup. It seems working now. image

I close the issue.

One optional requirement: do you think you can add an information in the logs when outputing Node 0x00178801103A4F9B is known by 1 neighbors and add the IDs of the nodes?