Koenkk / Z-Stack-firmware

Compilation instructions and hex files for Z-Stack firmwares
MIT License
2.36k stars 645 forks source link

Ability to clone a coordinator to be able to have high availability #47

Closed orxaz closed 4 years ago

orxaz commented 5 years ago

Hi.

Sorry if that I am asking for is a folly.

Would be nice that a coordinator can be cloned, to have a different RPi, ready to be started if the first RPi came into troubles.

I have been reading about upgrading without repairing devices using foil ... but that I mean is literally clone a stick to have an identicall panid and every parameter / address / memory entries to be able to replace to another stick in seconds .

Do you think that this will be possible some day?

Thanks and sorry for my bad english.

andreasbrett commented 5 years ago

The latest firmwares don't require the foil workaround or re-pairing. If you set pan Id, network key and channel via configuration.yaml your second stick would use those settings and act as the new coordinator. You would only need to allow new devices to join. The new coordinator doesn't know the devices yet but will accept them directly without pairing as they know the network key. Once every device is known to the coordinator you can disable joining again.

orxaz commented 5 years ago

This sounds really well.

I need to do some tests before migrating all 40 devices to new network with pan ID, network key and channel configured via configuration.yaml, but will do ASAP, can't wait to test upgraded sticks taking all my devices and working well.

I need to clone ieAddr as well, right? With SmartRF Flash Programmer maybe?

Many thanks for your reply.

orxaz commented 5 years ago

Answering myself, yes, ieeeAddr can be writen with SmartRF Flash Programmer.

Thank you for your help.

orxaz commented 5 years ago

Tested in a single device environment and worked like a charm, cloned ieeeAddr in 2 sticks, configured pan_id via configuration.yaml, leaved default key, paired one device, switched off RPi, changed stick, switched on, clicked button on device and recognized without problem.

Need to migrate all my environment with many more buttons, switches and bulbs, but looks well.

andreasbrett commented 5 years ago

That's great! Will clone my stick as well. It's good to know in theory that it works but it's worth so much more the practical tests you made.

Regarding the writing of the ieeeAddress through SmartRF Flash Programmer: I guess you unchecked "retain ieee address" and entered the address in the input field above and right of it that is labeled "IEEE 0x"?

orxaz commented 5 years ago

I did not uncheck this, but may be done this way surelly on flashing firmware.

The stick seems to have two ieeeAddr, let's call them main one, and second one.

I think that it works like: if you flash the second, then this one is used, if not, the first one is used. This is only an hypothesis.

When CC debuger and stick are pluged, SmartRF Flash Programmer show 2 additional buttons ("Read IEEE", "Write IEEE" (disabled by default)), a primary/secondary "Location" selector control, and a "IEEE 0x" text field. If you read IEEE, the text field becomes filled with the main (I think) ieeeAdr. If you (and olny if you) change location to secondary, the "Write" button gets enabled. Then you can fill the text field with the desired ieeeAddr (without 0x) and click Write. Then the second ieeeAdr on your stick will be writed, and will be the one zigbee2mqtt reports and uses when starts.

If, even after flash new ieeeAdr, you read in primary location, each stick still have it's own unique ieeeAdr, but if you read in secondary location, both sticks have the same ieAddr.

lolorc commented 5 years ago

in the end, does it really require sticks to have the same ieeeAddr ?

orxaz commented 5 years ago

Yes, and same panId, channel and encryption key. If not, devices will not be in same network, and will not send any information.

I will report soon how this work in a network with 40+ devices, about 10 of them routers.

My plan is to have 2 RPi and swap them with 2 different sticks. Will update ASAP.

mateuszdrab commented 5 years ago

Hey guys.

So I just want to double check, I'm currently running on firmware 20181024 on my zigbee coordinator. If I buy a second one and clone the ieee adresss and flash the latest firmware I believe it will work fine with my already paired devices? If that's the case, I'll upgrade the old one to latest firmware as well but only after testing it with the new stick. I don't want to have to repair all 20 devices. What about pan ID, network key and channel? Those are currently not configured in my instance of zigbee2mqtt (1.1.1).

Side question, I'm running 20181024 and I have 20 devices paired, I thought the limit was 16, how's that possible?

orxaz commented 5 years ago

My short answer is yes.

My long answer is: You must first check what panId have your network, and configure it in configuration.yaml. I choose 1a70:

advanced: pan_id: 6768

To check yours, add "log_level: 'debug'" to advanced block in configuration.yaml and restart, you'll se a couple lines like:

Mar 07 16:06:51 zigbee1 npm[2987]:   zigbee2mqtt:debug 2019-3-7 16:06:51 Using zigbee-shepherd with settings: '{"net":{"panId":6768,"channelList":[11],"precfgkey":[1,3,5,7,9,11,13,15,0,2,4,6,8,10,12,13]},"dbPath":"/opt/zigbee2mqtt/data/database.db","sp":{"baudRate":115200,"rtscts":true}}'
Mar 07 16:06:53 zigbee1 npm[2987]:   zigbee2mqtt:debug 2019-3-7 16:06:53 zigbee-shepherd info: {"enabled":true,"net":{"state":"Coordinator","channel":11,"panId":"0x1a70","extPanId":"0xdddddddddddddddd","ieeeAddr":"0x00124b0002c5a057","nwkAddr":0},"firmware":{"transportrev":2,"product":0,"version":"2.6.3","revision":20190109},"startTime":1551971213,"joinTimeLeft":0}

You can se here your paiId in hex and dec. You can configure it as you want, zigbee2mqtt will convert it to dec.

Network key and channel are default in my network too, I had to do nothing with them.

I assume you are taking in mind the cloning ieeeAddr proces.

IMPORTANT: I must say that once, I has not configured panId in configuration.yaml, default using 0x1a63, tried to configure it to 0x1a63 in configuration.yaml, mainly to prevent it to change when doing tests, and when I restarted it suddenly changed to 0x1a64 and I was not able to switch it back to it's original value.

Hope this not happen to you, but take this in mind, you'll need to repair all your devices if this happen.

For the 20 devices question, which kind of devices do you have? Do you have some bulbs or power outlets paired?

mateuszdrab commented 5 years ago

@orxaz

Thanks for replying. I was just looking up my firmware version and I can't find version 20181024 anywhere, I definitely downloaded it from the link on the zigbee2mqtt page.

The 20 devices I have are all Xiaomi door/temp/motion sensors. I also have Phillips Hue but they use their own hub. Now with the firmware I have, they seem to work fine with, but I am worried that it will not be the case when I use the latest firmware. Also, I need to add 4 more devices so it will be 24 and I have seen another issue here where someone said they have 21 devices work fine, 22nd doesn't work. I can use the alternative firmware: https://github.com/Koenkk/Z-Stack-firmware/blob/master/coordinator/CC2531/alternatives/max_devices/bin/CC2531ZNP-Prod_20181224.zip but this one has an older timestamp in the file name despite coming the same commit as the main firmware which file name is https://github.com/Koenkk/Z-Stack-firmware/blob/master/coordinator/CC2531/bin/CC2531ZNP-Prod_20190109.zip

I will follow your instructions to get the panId in the config, what about network ID and channel? I guess it would be better to hard code them into the config file as well. Hopefully my panId won't change, but I won't play with it until I have the new sticks delievered which will be in two weeks or so.

orxaz commented 5 years ago

I never hardcoded them, I think zigbee2mqtt always take defaults for this, but I guess that hardcoding it will not be a problem.

Must say too, that in my previous network, some devices randomly leaved the network and I must pair them again, with firmware 20190109, but this time this is not happening. This was a very strange thing, I found no issues like this, and now this is not happening, after about 4-5 days of repairing all my devices.

I recommend you to pair a router to your network to avoid the limit. You can use something like KS-SM001, I have some of them, are cheap and works well.

Be carefull, if you reach the limit, is possible that later you can't pair more devices even removing some of paired ones.

mateuszdrab commented 5 years ago

I never hardcoded them, I think zigbee2mqtt always take defaults for this, but I guess that hardcoding it will not be a problem.

Must say too, that in my previous network, some devices randomly leaved the network and I must pair them again, with firmware 20190109, but this time this is not happening. This was a very strange thing, I found no issues like this, and now this is not happening, after about 4-5 days of repairing all my devices.

I recommend you to pair a router to your network to avoid the limit. You can use something like KS-SM001, I have some of them, are cheap and works well.

Be carefull, if you reach the limit, is possible that later you can't pair more devices even removing some of paired ones.

I've just hardcoded the pan_id, channel, network_key fields into the config. It seems to work still. I've just realised that in my second location, zigbee channel for hue was the same as the zigbee2mqtt channel would be. But I don't have zigbee2mqtt there yet so I switched hue to channel 20.

I'm actually thinking I will change the network key from the default and repair the devices sometime when I upgrade the firmware. I guess it will be more secure than it is now.

And in terms of adding the new devices, I will try out the alternative firmware first. Hopefully it will work.

What I don't understand is that if it's possible to change zigbee sticks and have paired devices still work. Why is there a limit in devices that can be paired. When I change the stick to the clone, those devices wouldn't be in memory and yet would still work. So how does that limit get enforced, first come first served?

orxaz commented 5 years ago

If you have another RPi, you can do it without "downtime", repairing each device against the second RPi, using the same MQTT topic.

I neither understand this question, maybe is like you say, first come first served.

Good luck in your tests.

mateuszdrab commented 5 years ago

If you have another RPi, you can do it without "downtime", repairing each device against the second RPi, using the same MQTT topic.

I neither understand this question, maybe is like you say, first come first served.

Good luck in your tests.

I could do, or I could even run a second instance of the service.

Either way, I'll update you in about 2-3 weeks. Particularly on the topic of first come first served.

Thanks

orxaz commented 5 years ago

Hi.

Some results more.

Seems that will not be easy to switch between sticks, at least if you have routers.

I tried to switch it on my house a couple days ago, and lights did not work.

I am doing some tests now in lab, and switching sticks is ok for end devices, but routers are not in the network.

Seem easy to re-join them, but can not have a HA (high availability) environment at the moment.

After switch sticks, when I tried to act on a router device, I have this:

mar 18 15:32:54 Vaio npm[437]: zigbee2mqtt:info 18/3/2019 15:32:54 Zigbee publish to device '0x01124b001bae7495', genOnOff - toggle - {} - {"manufSpec":0,"disDefaultRsp":0} - null mar 18 15:32:54 Vaio npm[437]: zigbee2mqtt:error 18/3/2019 15:32:54 Zigbee publish to device '0x01124b001bae7495', genOnOff - toggle - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network. mar 18 15:32:54 Vaio npm[437]: zigbee2mqtt:info 18/3/2019 15:32:54 MQTT publish: topic 'zigbee/bridge/log', payload '{"type":"zigbee_publish_error","message":"Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.","meta":{"entity":{"ID":"0x01124b001bae7495","type":"device","friendlyName":"onoff"},"message":"toggle"}}'

After this try, I joined the device (KS-SM001) in 2 seconds and started working again, but this is not the desired behaviour for me :(

GerSant commented 5 years ago

I use an Arduino UNO + CCLib for flashing CC2531, how can i change "ieeeAddr" with this method? its necessary or only panid + channellist + precfgkey are enough?

orxaz commented 5 years ago

Hi @GerSant

I'm afraid that is necessary.

GerSant commented 5 years ago

Do you known how to do that with the Arduino + CCLib method?

Thanks

orxaz commented 5 years ago

No, sorry, never did it this way

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

cadavre commented 1 year ago

@orxaz have you eventually ended up with a 2-stick HA environment or you abandoned the idea?

glcos commented 1 year ago

Not sure if this is of some interest in this thread, but I achieved near high availability by switching a single coordinator based on a RF Star CC2652P between two Raspberry Pi nodes. I developed a card that does so.

haberry zoom scheda

orxaz commented 1 year ago

@orxaz have you eventually ended up with a 2-stick HA environment or you abandoned the idea?

Hi!

I had it some time with kinda bash scripts to start and shutdown VMs if fail is detected, but one of the CC2531 died and I started again with a Sonoff dongle. I have 2 really, with same pan ID, but not in HA, in fact one of them is in a drawer.

@glcos , your project looks relly nice!

cbundy commented 10 months ago

@glcos - any link to more details, looks like a great project 👍

glcos commented 10 months ago

@cbundy sooner or later I will find some time to publish some more details and the schematics