Koenkk / zigbee-herdsman

A Node.js Zigbee library
MIT License
456 stars 277 forks source link

ZNP Adapter Manager/Backup Refactor #303

Closed castorw closed 3 years ago

castorw commented 3 years ago

This PR addresses issues discussed in https://github.com/Koenkk/zigbee-herdsman/issues/286. Although quite large - it is a complex refactor of ZNP adapter startup, backup management and commissioning process.

Status:

@puddly - you are linking frame counter with network key in your proposal (https://github.com/zigpy/zigpy/issues/557#issuecomment-745072411), however from what I see the frame counters are bound to extended PAN IDs in NWK_SEC_MATERIAL table. Or am I doing this wrong?

@Koenkk If you'd like I would be happy to discuss this further, preferably in some more interactive manner - Discord, Teams, ... you name it. Thanks.

Koenkk commented 3 years ago

You can contact me on telegram (@koenkk)

puddly commented 3 years ago

@puddly - you are linking frame counter with network key in your proposal (zigpy/zigpy#557 (comment)), however from what I see the frame counters are bound to extended PAN IDs in NWK_SEC_MATERIAL table. Or am I doing this wrong?

It's stored in different places for each Z-Stack version but I believe the only way to reset the counter is to rotate the network key. You can't rotate the EPID. This also matched the structure of the SiLabs EZSP backup format.

Determine if NWK_TCLK_TABLE backup/restore is necessary

You'll need those for APS encryption, which uses unique keys for each device. They're derived from the "TCLK_SEED", the device's IEEE, and a shift, to save space, so you'll have to actually "export" them before use. I haven't had time to investigate if Z-Stack can re-import them in their expanded form, since each stack handles hashed keys differently.

It you want to discuss the backup stuff further (it's not set in stone but diverging from the unified backup format this soon won't make it very unified :) ) or find anything interesting within Z-Stack, you can contact me over Discord: puddly#8728.

castorw commented 3 years ago

Tests have been fixed/added. Freshly rebased. Ready for review. Hope we can finally merge this 😊

battuashwik commented 3 years ago

Any reason why this PR is not being merged

Koenkk commented 3 years ago

@battuashwik we are still working out some details.

Hedda commented 3 years ago

Not sure if missing but see no reference to the "Open ZigBee Coordinator Backup Format" project in the introduction (which is a collaboration between puddly from zigpy and castorw fron zigbee2mqtt ) -> https://github.com/zigpy/open-coordinator-backup/

By the way, noticed zigpy-znp v0.5.0 been released with initial support for this feature in zigpy-znp and in the release notes puddly credits castorw and reference this effort to bring support for this "Open ZigBee Coordinator Backup Format" to Z2M:

https://github.com/zigpy/zigpy-znp/releases/tag/v0.5.0

"Huge thank you to @castorw from the Z2M project for working with me to bring concurrent support for the Open Coordinator Backup Format to Z2M. This allows for bidirectional migration between any coordinators supported by zigpy-znp. More documentation is available in TOOLS.md#Backup and restore."

PS: You might want to consider borrowing some of the descriptions that puddly uses to describe this feature in zigpy-znp docs:

https://github.com/zigpy/zigpy-znp/blob/dev/TOOLS.md#backup-and-restore

puddly commented 3 years ago

Awesome work, @castorw!

Koenkk commented 3 years ago

You too @puddly ! :)

Hedda commented 3 years ago

Awesome work @castorw and @puddly

Again, suggest any documentations reference https://github.com/zigpy/open-coordinator-backup/ in hope of more collaborators ;)

nurikk commented 2 years ago

Preem work! Can we expose some functionality over Zigbee2MQTT api? Like create/delete/list/restore? @Koenkk

Koenkk commented 2 years ago

@nurikk you mean to control backups? This is done automatically every day already and on shutdown.

nurikk commented 2 years ago

@nurikk you mean to control backups? This is done automatically every day already and on shutdown.

Yes, control backups. Might come in handy to download backup using browser. Having more than one backup will help in case when you insert another (initialised) stick into your system and start Zigbee2MQTT:)

Koenkk commented 2 years ago

@nurikk for this we would not only need the backup but also the corresponding database.db file. I will check it.

Koenkk commented 2 years ago

@nurikk would it be ok for you if the data is provided as json to the frontend?

nurikk commented 2 years ago

@nurikk would it be ok for you if the data is provided as json to the frontend?

yes, should be fine

Koenkk commented 2 years ago

@nurikk thinking more about this, there are 3 things to backup: database.db, coordinator_backup.json and the configuration.yaml. When the user would click the backup button in the frontend, I think it's the most handy that a zip file is downloaded with those 3 files in it. Does it make more sense that the data is returned as a binary string instead?

nurikk commented 2 years ago

@nurikk thinking more about this, there are 3 things to backup: database.db, coordinator_backup.json and the configuration.yaml. When the user would click the backup button in the frontend, I think it's the most handy that a zip file is downloaded with those 3 files in it. Does it make more sense that the data is returned as a binary string instead?

we can keep backup file as simple json with key-value pairs, key - path, value - data uri like:

{
    "meta": {
        "date": "2021-05-28T00:36:24.604Z",
        "some_random_stats": "here",
         "z2m_version": "0.15.5"
    },
    "files": {
        "./database.db": "data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==",
        "./configuration.yaml": "data:text/plain;base64,......"
    }
}
Hedda commented 2 years ago

@nurikk thinking more about this, there are 3 things to backup: database.db, coordinator_backup.json and the configuration.yaml. When the user would click the backup button in the frontend, I think it's the most handy that a zip file is downloaded with those 3 files in it.

The ability for an end-user to get a such full backup package as zip file from the frontend would be awesome and make it very user-friendly to perform manual backups and restores when for example migrating between different computers/installations/setups.

Koenkk commented 2 years ago

@nurikk but how would the user restore it? (since it cannot restore a json file, the files need to be replaced)

nurikk commented 2 years ago

@nurikk but how would the user restore it? (since it cannot restore a json file, the files need to be replaced)

I thought about two ways of restoring this backups. 1) User uploads this backup file from ui and boom, magic happens 2) User drops this backup file into z2m directory and restart z2m, and then magic happens.

But I agree that simple zip file works nice as well KISS :)

Koenkk commented 2 years ago

If I provide the info like https://github.com/Koenkk/zigbee-herdsman/pull/303#issuecomment-850031454 (as json), could you contstruct a zip from the frontend?

nurikk commented 2 years ago

If I provide the info like #303 (comment) (as json), could you contstruct a zip from the frontend?

Yes

kirovilya commented 2 years ago

A small note / question ... Some iobroker.zigbee users had their own ExtPanID different from the default ("DDDDDDDDDDDDDDDD"). These users now have an error at startup:


Configuration is not consistent with adapter state/backup!
 - PAN ID: configured=5203, adapter=5204
 - Extended PAN ID: configured=341bd2fe158bdcb5, adapter=00124b00192e822f
 - Network Key: configured=0a03050709cb0d0f66020406080a0c0e, adapter=0a03050709cb0d0f66020406080a0c0e
 - Channel List: configured=25, adapter=25
 Please update configuration to prevent further issues.
 If you wish to re-commission your network, please remove coordinator backup at /opt/iobroker/iobroker-data/zigbee_0/nvbackup.json.
 Re-commissioning your network will require re-pairing of all devices!
 Starting zigbee-herdsman problem : "startup failed - configuration-adapter mismatch - see logs above for more information"
 Failed to start Zigbee
 Error herdsman start

It can be seen that the "Extended PAN ID" in the settings and in the adapter do not match. If they go back to the previous version, then there is no error and everything works.

341bd2fe158bdcb5 - correct extpanid from settings. 00124b00192e822f - extpanid read from chip / adapter, matches ieee chip.

Hence the questions:

kirovilya commented 2 years ago

Another case - wrong values in adapter side:

Configuration is not consistent with adapter state/backup!
 - PAN ID: configured=6754, adapter=65535
 - Extended PAN ID: configured=00124b00192e81a5, adapter=0000000000000000
 - Network Key: configured=01030507090b0d0f00020406080a0c0d, adapter=01030507090b0d0f00020406080a0c0d
 - Channel List: configured=11, adapter=
 Please update configuration to prevent further issues.
 If you wish to re-commission your network, please remove coordinator backup at /opt/iobroker/iobroker-data/zigbee_0/nvbackup.json.
 Re-commissioning your network will require re-pairing of all devices!
 Starting zigbee-herdsman problem : "startup failed - configuration-adapter mismatch - see logs above for more information"
 Failed to start Zigbee
 Error herdsman start
Koenkk commented 2 years ago

@castorw can you check this?

castorw commented 2 years ago
  • why is the extpanid not correct in the adapter?

EPID configuration wasn't implemented properly before in ZH. This PR fixed that, but it is possible that adapters initialized by pervious version of ZH may not have EPID properly configured. The occurrence of this issue is related to different firmwares in radios.

  • previously configured networks did not correctly register extpanid in the adapter?

See above.

  • what should such users do now so that they do not have to reconfigure the network again? try to set extpanid DDDDDDDDDDDDDDDD in settings?

I would suggest they set the EPID to the value reported in adapter. This is the easiest way to fix the issue for now.

castorw commented 2 years ago

Another case - wrong values in adapter side:

Configuration is not consistent with adapter state/backup!
 - PAN ID: configured=6754, adapter=65535
 - Extended PAN ID: configured=00124b00192e81a5, adapter=0000000000000000
 - Network Key: configured=01030507090b0d0f00020406080a0c0d, adapter=01030507090b0d0f00020406080a0c0d
 - Channel List: configured=11, adapter=
 Please update configuration to prevent further issues.
 If you wish to re-commission your network, please remove coordinator backup at /opt/iobroker/iobroker-data/zigbee_0/nvbackup.json.
 Re-commissioning your network will require re-pairing of all devices!
 Starting zigbee-herdsman problem : "startup failed - configuration-adapter mismatch - see logs above for more information"
 Failed to start Zigbee
 Error herdsman start

This looks like improperly configured adapter - possibly even in Router/EndDevice mode (wonder if this adapter could have worked before at all). This worked before migrating to updated ZH codebase? Was it run with ZH before?

kirovilya commented 2 years ago

EPID configuration wasn't implemented properly before in ZH. This PR fixed that, but it is possible that adapters initialized by pervious version of ZH may not have EPID properly configured. The occurrence of this issue is related to different firmwares in radios.

Of course the adapter and network were initialized in previous versions of ZH (most users will have this case). It is necessary to somehow give recommendations on what to do in this case, otherwise there will be many questions about this.

This is the easiest way to fix the issue for now.

it seems to me the simplest way is to indicate DDDDDDDDDDDDDDDD in extpanid.

kirovilya commented 2 years ago

This looks like improperly configured adapter - possibly even in Router/EndDevice mode (wonder if this adapter could have worked before at all). This worked before migrating to updated ZH codebase? Was it run with ZH before?

this adapter used to work as a coordinator (in the previous version of ZH). this is probably a problem with early firmwares for cc2538, but they worked for users. I'm afraid this problem cannot be solved by re-initializing the network. most likely the settings are saved at different addresses than in other firmware.

maybe add the ability to ignore some checks? otherwise some adapters will not be able to work

castorw commented 2 years ago

@kirovilya

kirovilya commented 2 years ago

Regarding the other issue, I am not really sure this can be caused by different firmware version. Only if NIB was relocated to another NV item ID - which I think is unlikely. Can you provide me with details on the firmware used in the CC2538 radio?

first version of firmware was https://github.com/antst/CC2538-ZNP-Coordinator-firmware (without sources) then moved here https://github.com/reverieline/CC2538-CC2592-ZNP with sources and history. there was some moves of NV_MEM https://github.com/reverieline/CC2538-CC2592-ZNP/commit/6329103280ebb26c97cd6804a6e01c2eee003cf7#diff-b0b2076efea010195f6a749aa25eddc1c1ef974c3d96ad5a2dc1eae71527d542

castorw commented 2 years ago

@kirovilya the diff does not imply NV relocation, it's only NV stretch so larger tables can be included. Anyway its unrelated to the NV item indexes which hasn't changed for a long time. I am positive that NIB location (item this problem is related to) is the same for Z-Stack 1.2 and Z-Stack 3.0.x and Z-Stack 3.x.0+ as well. The reason for the NV being in the state described above is unknown to me and so far the only occurrence of such state.

kirovilya commented 2 years ago

I see a lot of problems with the latest update. therefore, I will repeat my proposal to make it possible to disable the check at startup in order to work with the current settings.

castorw commented 2 years ago

@kirovilya Could you provide me with NV dump from the adapter which caused this issue (with PANID being 65535)? Possibly using https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/tools/nvram_read.py.

kirovilya commented 2 years ago

@castorw I can't do it quickly. this is not my stick, and the user is not an expert. will have to figure out a way for him to execute the code

castorw commented 2 years ago

@kirovilya Thank you, this would be essential for further diagnostics. I am trying to think about the ability to disable the checks, but most of the updated code-base loses sense in such way. The goal was to stabilize the way ZH handles network provisioning and not create another cracks allowing improperly configured adapters to run.

@Koenkk any thoughts on this?

Koenkk commented 2 years ago

@kirovilya @castorw let's continue in #376