Closed sjorge closed 3 years ago
While I'm waiting a bit to test to see if it is fixed, Tho85 on discord seems to have a similar problem and that stated after pairing a Xiaomi Light Sensor...
Which I was also using up until a week ago, I wonder if the spammy nature of the sensor plays a part in breaking the Z-Stack firmwares timer or something.
https://pkg.blackdot.be/cores/zigbee/srctr_pair_fai.pcapng
The changes didn't help, the bulb gets kicked out immediately.
Please try with this firmware: CC2531ZNP-Prod_source_routing_20200530.hex.zip
It won't fix the issue, but during joining it will send leave request to non-existing addresses, this will help me to determine where in the code things go wrong. After flashing first check if you are still able to join a device (to make sure I didn't mess up).
I just realized something... I don't think unplugged the coordinator after applying the patch... Would that have a possible effect on the result?
As it might have already set the timer, it expired. The new join doesn't set the timer but it may have had an expired timer value lingering around?
I'll try to flash the stick somewhere next week.
@sjorge yes, that could definitely cause an issue. So please make sure to do this. Nevertheless still try with that new firmware as it will provide extra information in case it still fails.
New firmware flashed, now we wait a bit and try too pair. So if I understood correctly you added the equalivant of a debug print to the firmware bu sending a remove to a particular unique set of non existing devices in different paths of the code?
The firmware is not very responsive...
zigbee2mqtt:error 2020-05-31 19:14:13: Failed to ping 'bedroom/bed_lamp/bulb'
zigbee2mqtt:info 2020-05-31 19:14:13: MQTT publish: topic 'zigbee2mqtt/bedroom/desk_lamp/remote', payload '{"battery":74,"linkquality":42,"update_available":false,"click":"off"}'
zigbee2mqtt:info 2020-05-31 19:14:15: MQTT publish: topic 'zigbee2mqtt/bedroom/desk_lamp/remote', payload '{"battery":74,"linkquality":42,"update_available":false,"click":"on"}'
zigbee2mqtt:info 2020-05-31 19:14:27: MQTT publish: topic 'zigbee2mqtt/masterbedroom/lamp/bulb2/availability', payload 'online'
zigbee2mqtt:info 2020-05-31 19:14:27: MQTT publish: topic 'zigbee2mqtt/masterbedroom/lamp/bulb1/availability', payload 'online'
zigbee2mqtt:info 2020-05-31 19:14:28: MQTT publish: topic 'zigbee2mqtt/masterbedroom/lamp', payload '{"state":"OFF","brightness":254}'
zigbee2mqtt:info 2020-05-31 19:14:28: MQTT publish: topic 'zigbee2mqtt/masterbedroom/lamp/bulb2', payload '{"state":"OFF","linkquality":47,"update_available":false,"brightness":254}'
zigbee2mqtt:info 2020-05-31 19:14:28: MQTT publish: topic 'zigbee2mqtt/masterbedroom/lamp', payload '{"state":"OFF","brightness":254}'
zigbee2mqtt:info 2020-05-31 19:14:28: MQTT publish: topic 'zigbee2mqtt/masterbedroom/lamp/bulb1', payload '{"update_available":false,"state":"OFF","linkquality":31,"brightness":254}'
zigbee2mqtt:error 2020-05-31 19:14:35: Failed to read state of 'masterbedroom/lamp/bulb1' after reconnect
zigbee2mqtt:error 2020-05-31 19:14:36: Failed to read state of 'masterbedroom/lamp/bulb2' after reconnect
zigbee2mqtt:info 2020-05-31 19:14:48: MQTT publish: topic 'zigbee2mqtt/bathroom/lamp/bulb/availability', payload 'online'
zigbee2mqtt:info 2020-05-31 19:14:49: MQTT publish: topic 'zigbee2mqtt/bathroom/lamp', payload '{"state":"OFF","brightness":254,"color_temp":400,"color":{"x":0.4599,"y":0.4106}}'
zigbee2mqtt:info 2020-05-31 19:14:49: MQTT publish: topic 'zigbee2mqtt/bathroom/lamp/bulb', payload '{"state":"OFF","linkquality":42,"update_available":false,"brightness":254,"color_temp":400,"color":{"x":0.4599,"y":0.4106}}'
zigbee2mqtt:info 2020-05-31 19:14:58: MQTT publish: topic 'zigbee2mqtt/masterbedroom/repeater/availability', payload 'online'
zigbee2mqtt:info 2020-05-31 19:14:59: MQTT publish: topic 'zigbee2mqtt/masterbedroom/repeater', payload '{"linkquality":42,"update_available":false}'
zigbee2mqtt:error 2020-05-31 19:15:12: Failed to read state of 'bathroom/lamp/bulb' after reconnect
About 1/10 changes mode actually make it through
https://pkg.blackdot.be/cores/zigbee/errors.pcapng
There is one near packet 330 but a few before that, but they scrolled by to fast. Note, this is not during pairing... this is just existing devices dropping and failing to rejoin after the device announcement... going to refresh the normal source routing firmware tomorrow morning as nothing is working atm.
Got my issue tabs messed up.
Back on the vanilla source routing firmware, network was back to normal/stable after ~5min or so.
Still have the diff to controller.js applied so 🤞 the first time it failed because I had not rest the stick, we'll know in a day or 5 I guess.
Tried to pair a device today, same issue :( so the change to non timer version didn't seem to help. Adaptor was reset when I reflashed normal src routing firmware when the debug one broke everything.
Can you sniff the traffic of the debug one? (just to see where it fails)
https://pkg.blackdot.be/cores/zigbee/errors.pcapng was captures with the debug firmware when it was locking up. Which was almost immediately after a unplug, plug, start z2m
It seems that my changed indeed didn't work, I will try to figure out an alternative way to do debugging.
It seems that my changed indeed didn't work, I will try to figure out an alternative way to do debugging.
Is USART 0 or USART 1 still free in the software? It would require a little soldering, but I believe you can route the Rx & Tx to the (tiny) extension header on the side of the CC2531 USB Dongle (Table 7-1. Peripheral I/O Pin Mapping) and the "Additional test pins" from the schematic If you connect a FTDI cable to that you can do debug printing from the dongle.
It is a very common way to debug embedded software, so it would surprise me if there isn't something in the TI software stack for that already.
Soldering is a bit beyond my skill level... based on my succes rate.
~ sjorge
On 7 Jun 2020, at 11:52, FaBjE notifications@github.com wrote:
It seems that my changed indeed didn't work, I will try to figure out an alternative way to do debugging.
Is USART 0 or USART 1 still free in the software? It would require a little soldering, but I believe you can route the Rx & Tx to the (tiny) extension header on the side of the CC2531 USB Dongle (Table 7-1. Peripheral I/O Pin Mapping) and the "Additional test pins" from the schematic If you connect a FTDI cable to that you can do debug printing from the dongle.
It is a very common way to debug embedded software, so it would surprise me if there isn't something in the TI software stack for that already.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
If it is a feasible solution to help Koenkk debugging I can do some soldering and we can arrange some postage/shipping scheme
@FaBjE what is required for this? Iar workbench only?
@sjorge I've modified the firmware a bit, hopefully it doesn't crash now: CC2531ZNP-Prod_source_routing_20200609.zip
@Koenkk Just some code. Compiler has nothing to do with it. (I'm well familiar with IAR, Not so much with the TI Z-Stack)
For debug printing, you just configure the uart in code at 115200 or anything. and put printf statements in the code. https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/169089
The processor pins output uart at TTL level, so you need a converter from TTL to pc, something like this is the easiest to use. https://www.antratek.nl/ftdi-usb-to-serial-ttl-cable-3-3v-ttl
This requires a little soldering on the CC2531 usb dongle pcb as it doesn't have a ready-to-use connector for it, But it has the appropriate processor pins on the pinheader pads at the side.
Than you just open a COM terminal on your PC an can see what is going on in the software ;)
@FaBjE thanks for sharing this, very interested in this and will probably will be useful in the future.
@FaBjE thanks for sharing this, very interested in this and will probably will be useful in the future.
Happy to help :) If you need any assistance integrating this, let me know.
I’ll flash the new firmware this weekend
Flash the firmware but immediately the network fell apart.
I will upload a small capture later where I use a remote to toggle a group of bulbs, then try to change it via mqtt, then try to change a single bulb. Then turn them back on via the remote.
Going to reflash and get everything back up and running first.
https://pkg.blackdot.be/cores/zigbee/bad2.pcapng
I think I know what is causing the issue though, after every device announce from an actually device, e.g. the hue motion sensor. The firmware sends a flood of debug announcements and I think that trips up the network... so the device never rejoins.
I also noticed the ikea remotes don't send a announcement when I push a button so probably why they work fine. The first time A bulb sends some data (attReport) it also announces itself again but it gets met with a flood of messages and it never rejoins.
I flashed the std source routing firmware again and only the devices that had tried to rejoin, had issue. The once that had not yet send a message were fine.
It might slower but perhaps we could toggle the led to green and red at 2 most likely points where you think the issues is and turn it of at startup.
It's a lot less info to go on, but it would not send zigbee traffic that could cause other devices to be unhappy?
I don't understand yet why it doesn't work, your sniff doesn't contain any messages send from the coordinator, probably the code crashes or something.
Anyway, based on the idea of @FaBjE I found a way to output debug logging to zigbee-herdsman. Try with this firmware:
CC2531ZNP-Prod_20200614.zip , now run zigbee2mqtt with DEBUG=zigbee-herdsman:adapter:zStack:znp:AREQ npm start
and when pairing a device you will see this logging:
I'll reflash later today.
So I just run z2m as normal, once I noticed joining no longer works I stop and start it manually with the debugging? (given only a znc reset or physical unplug/replug fixes it, and a z2m restart has no effect)
Always run with this logging (there won't be any overhead) and just check the log when joining fails for the first time.
Of to a good start, this firmware is stable! I will adapt my init script to run with the extra debugging now.
Everything stopped working today, got a bit of the log. https://gist.github.com/sjorge/bcf51e3d4b8269788bf61aa8b927b339
Seems to be a memory issue on the ZNC, reset via mqtt did not work. Had to unplug and wait a bit before replugging.
Not sure this is due to the logging being added, or that his just highlighted the actually issue were some memory is never cleared. A full association table also sounds like a memory thing.
Forgot to deallocate the send debug message, should be fixed with: CC2531ZNP-Prod.zip
Will flash tomorrow, just got home. Lets hope that one runs for a bit longer until the pairing issue manifests.
Updated the stick, now we wait
Still working, but looks like my CC got unplugged an replugged a day ago, probably while someone was loading the laundry :( Been a bit AFK too, so no update for now.
Very odd, I was able to join a bulb today after 6 days of uptime. Were there any other changes aside from the debug printing?
@sjorge no there we no other changes. What debug logging did you got? Theoretically: the added debug statements increase the execution time, in case this bug turns out to be a race condition this could actually "solve" it.
On 2020-07-01 22:04, Koen Kanters wrote:
@sjorge [1] no there we no other changes. What debug logging did you got? Theoretically: the added debug statements increase the execution time, in case this bug turns out to be a race condition this could actually "solve" it.
-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [2], or unsubscribe [3].
I already turned of my laptop, I'll pair the bulb again tomorrow and capture the output to a file.
-- ~ sjorge
[1] https://github.com/sjorge [2] https://github.com/Koenkk/zigbee2mqtt/issues/3177#issuecomment-652620178 [3] https://github.com/notifications/unsubscribe-auth/AAC4WEMKHTXWOZSRQGT5UDTRZOJFXANCNFSM4LRO5I4Q
Still works fine after 10 days, so maybe it is some timing thing.
https://gist.github.com/sjorge/7bab8456eb865848db3e5ed9c4dd55fc here is the log, it wasn't fully scewed in so 2 device announces, but it joined fine after I tightened it in the socket without doing anything else.
Looks good, I also see the logging I've added. I'm not sure how to continue here yet. Maybe wait for some more days or see if anybody else can confirm it has been fixed with this FW?
If the firmware is "stable" I can give it a shot. I need to find a moment to do the initial setup though.
Edit: Can the debug log be enabled/logged in the docker container?
I noticed no difference in stability with this or the normal firmware, so I'd say it's stable. Not sure about docker as I don't use it.
@Koenkk Do you know if I can enable the debug logging in the docker container?
@FaBjE yes, see https://www.zigbee2mqtt.io/information/debug.html#docker
I just got a new device to test delivered, so will have another go at pairing a totally new device today or tomorrow.
Uptime has been 2 weeks, still working fine with debug logging turned on!
Good, TI asked me to figure out how much delay has to be added, I will try to provide 2 new firmwares this weekend where debug logging is replaced with 2 fixed delays.
Just updated my coordinator to CC2531ZNP-Prod.zip. I had to add pan_id: 0x1a63 to the config. Apparently the old one pan_id: 6755 was changed. The zigbee network is online again and seems good.
I added "- DEBUG=zigbee-herdsman:adapter:zStack:znp:AREQ" to my docker-compose file. And I see the adapter debug messages (displaying all messages payload) to the console. But they are not saved to the log.txt file.
I will try to join a new device in a few days.
Can you try this firmware: CC2531ZNP-Prod_20200720.zip (has some builtin sleeps without debug logging)
Mostly done moving stuff over to my new place now, once I have my ZNC back up and running, I'll flash and try the new firmware.
I flashed the new firmware, going to pair some bulbs now.
Pairing isn't working well, I managed to pair only 2 bulbs but now it's locked up. I'm going to revert to vanilla source routing and finish pairing everything as I have lots of bulbs to go
It seems to mostly timeout in the interviewing/configuration steps
Bug Report
What happened
I tried to join a new device, it failed.
The network/coordinator usb has been up for a about 1-1.5days now, joining new devices fail. Physically unplugging and replugging the usb (with z2m stopped ofcourse) re allows joining of new devices.
I did a capture of one of those failed join requests: https://pkg.blackdot.be/extras/zigbee/failed_join.pcapng
Not sure where it fails afterwards, it looks like z2m doesn't respond to the update device at all?
What did you expect to happen
The device to be able to join.
How to reproduce it (minimal and precise)
I can always reproduce this after a about 1-3 days after plugging in/out the USB device (rebooting the server does not help)
Debug Info
zigbee2mqtt version:
1.12.0-dev (commit #d52d520)
CC253X firmware version:{"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}}