Closed xefil closed 4 years ago
Hi @xefil ! Summer time, holiday time. :)
Looks like UDP connection problems between your HA and gateway. I already faced this kind of problem when was using test local connection at the same time. Is anything else than HA connecting your gateway?
Hi @cadavre ! good for you, I'll need to wait end of oktober for holidays :-D Actually there is nothing other that Ha that's connecting to the Xiaomi gw, but it's possible I've some Wifi issues (now should be resolved with new access point). btw, could this issue cause a freeze of HA itself? Actually the component is disabled and the problem is gone. Yep, it's true in the meanwhile I've replaced the WiFi AP, but it's strange that connection issues could cause this problem. Or the component remains in waiting hanging all other processes? Thanks a lot! Simon
I've added some precautions to prevent errors in case of connection timeouts (that is what's happening in your case), but it looks like it's not enough.
WiFi AP may be causing this issues for sure. Especially if problems started to show up after replacement.
I'll try to also take a look into the timeouts handling and reconnection handling.
Well, I've changed two aspects, AP and diabled the component, this doesn't help eheh
Should I git pull
or wait ? :-)
Thanks!!
I'll try to work on it in the evening. Will let here know.
Good, thank you!!
Hey guys, and good evening. I can confirm this problem as well.
Hey here! I reviewed the issues #2 and #3 – both looks the same. I fixed the bug, but I'm still working on working re-connection. I will need to test it in practice. As soon as I get it done – I'll update repository!
Thanks @cadavre I'll wait for news. I've re-enabled the component a week ago and same behaviour happened: timeout and HA freezes totally. Restart required. Cheers!
@cadavre well until update, I'll just not use the gateway, good thing I've bought the zigbee stick.
Same here, HA freezes when "unavailable by timeout" message appears.
Hi guys, I just pushed changes to fix this issue:
It gives this in logs right now :
2019-09-17 09:52:34 INFO (MainThread) [custom_components.miio_gateway] Gateway became unavailable by timeout!
2019-09-17 09:54:40 INFO (Thread-2) [custom_components.miio_gateway] Gateway became available!
I tested by unplugging the gateway, waited to timeout, checked the interfaces:
"niedostępny" [PL] means "unavailable"
Then plugged the gateway in and waited for next ping.
Additionally I changed ping and unavailability timers from 5 and 11 minutes (respectively) to 1 and 3 minutes. HA will get availability info more often now.
Let me know if it fixes the bug of HA becoming unavailable and if your gateways re-connect after connection was dropped. I guess I'll need to add few more lines of code for re-connecting after hard reset.
hate to be the bearer of bad news, but for it's still unstable.
Same error cannot pop up. Could you be more specific? :)
Hi all, this is my log file when hassio freeze; 2019-09-22 20:09:46 DEBUG (Thread-3) [custom_components.miio_gateway] Received event: lumi.gateway.mieu01 miio.gateway - event.heartbeat 2019-09-22 20:09:56 DEBUG (Thread-3) [custom_components.miio_gateway] Received data: 2019-09-22 20:09:56 DEBUG (Thread-3) [custom_components.miio_gateway] b'{"id":2607,"method":"_sync.getUserSceneInfo","params":[{"did":"264429336","model":"lumi.gateway.mieu01","num":4,"scene":[{"id":474368496,"time":1566650248},{"id":474368497,"time":1566650248},{"id":474368499,"time":1566650248},{"id":474368500,"time":1566650248}]}]}\x00' 2019-09-22 20:09:58 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 1 2019-09-22 20:10:59 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 2 2019-09-22 20:12:00 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 3 2019-09-22 20:13:01 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 4 2019-09-22 20:14:01 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 5 2019-09-22 20:15:02 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 6 2019-09-22 20:16:03 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 7
I add few line in init.py ; @callback def _async_set_unavailable(self, now): """Set state to UNAVAILABLE.""" if self._ping_loss > 25: _LOGGER.info("Gateway became unavailable by timeout!") self._is_available = False self._ping_loss = 0 for func in self.callbacks: func(None, None, EVENT_AVAILABILITY)
@callback
def _async_send_ping(self, now):
"""Send ping to gateway."""
self._ping_loss = self._ping_loss + 1
_LOGGER.debug("Ping N: " + str(self._ping_loss))
self.send_to_hub({"method": "internal.PING"})
to see when the xiaomi gateway stop to respond (or the script don't recive data from it).
sorry for my english, I hope these data will help you understand the problem. thank you for what you do for us.
So: 1) HA didn't freezed? 2) Devices went unavailable?
What you can see here is that your gateway is not responding to ping. Hardly anything I can do about. Have you done anything to your gateway recently? I'm using mine since I've published the component and it works fine. Having problems with connection only when there was power breakdown.
I need you to remind that if any external connection is being made – gateway closes connection that were already opened. Ie. if you send a command via another UDP socket. Is you network open to world?
Maybe I discovered the mystery, The problem presents itself those times when HUB xiaomi disconnects from WIFI. But the reason is not absolutely known to me. When no longer responding, HA stops completely and I have to restart it. To answer your questions;
N 1: HA freezes completely. N 2: no device connected on HA responds.
No, I didn't make any changes to the xiaomi hub beyond the guide. In your opinion, what could be the cause? Have you set a static IP on your HUB? If yes how?
Thank you for your time.
For me this hang of HA usually happens not when you unplug / replug gateway from power. This kind of disconnect usually recovers normally. But try to restart your router instead - so both gateway and HA are losing connection, and then upon router going back online - HA stops responding... (maybe when it tries to ping the gateway after connection to router is restored?)
I even tried to add some tweaks into code that are watching for socket errors and try to re-create the socket if it goes south, but sometimes HA freezes anyway, although it seems more stable for me... maybe just placebo.
I could make a PR some time later with those mods.
I too have tried to modify the code by inserting lines for reconnecting the socket when the pings are not answered exceed 5, but nothing to do, after the 12 ping unanswered HA stops completely, without giving any kind of error. Since I started using HUB, it has never started working again, I have a router with openwrt, I reset and updated the router thinking that was the problem but it is not. My network seems to be working properly. When you can publish the placebo and I will test it too.
My init.py file is old and without current fixes from cadavre, so need to merge them somehow
ok, merged everything, just clone from my forked repo.
ok, merged everything, just clone from my forked repo.
nothing to do, the problem remains, never goes wrong socket. I am trying to write a program to see what is sent by HUB without going through HA and monitoring the results. If not a problem, I can think that the "asincyo" function does not generate the desired results.
So any news on this issue from anybody? Last log i captured during which HA has frozen:
2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] Sending data: 2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] b'{"method": "internal.PING"}' 2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] Received data: 2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] b'{"method": "internal.PONG", "result": ["online"]}\x00' 2019-10-05 08:13:10 DEBUG (Thread-2) [custom_components.miio_gateway] Received data: 2019-10-05 08:13:10 DEBUG (Thread-2) [custom_components.miio_gateway] b'{"id":6615,"method":"event.heartbeat","params":[{"w_SL":"","w_LQ":"","TIME":"719419","FRAM":"213647360"}]}{"id":12345,"method":"local.query_time"}{"id":6616,"method":"event.heartbeat","params":[{"w_SL":"","w_LQ":"","TIME":"719419","FRAM":"213647360"}]}{"id":6617,"method":"_sync.neighborDevInfo","params":{"did":[]}}' 2019-10-05 08:13:10 DEBUG (Thread-2) [custom_components.miio_gateway] Received event: lumi.gateway.mieu01 miio.gateway - event.heartbeat 2019-10-05 08:17:36 INFO (MainThread) [custom_components.miio_gateway] Gateway became unavailable by timeout!
Very strange one this time: You see there is a PING-PONG message is working normally, then there is a "heartbeat" standart thing gateway sends every few minutes And then goes interesting part: There is complete silence for 4:26 minutes, there are not even "PING" messages or anything, and suddenly "unavailable by timeout".
Any fresh thoughts?
I think i found someting!!! look at there 2 lines:
Line 251: print("Received unknown method: " + str(data)) Line 260: print("Non-parseable data: " + str(data))
First of all - they should be _LOGGER, but that just minor bug of this not being printer. But major thing - they try to print "data" - in this method there is no such variable and script crashes!!
That exactly what happened to me in the post above i made. there was a requested method: {"id":12345,"method":"local.query_time"}
This method is unknown to library, so it tried to "print" it and BOOM!
My repo with fix is ready. s4-crashpc - your gateway seems to cause very consistent hangs - can you give it a spin?
Good find @quarcko ! I'm starting new corporate project since last month and it's extremely hard for me to make any additional work. But I will. With miio_gateway and dafang_camera because I'm also facing some issues there. Especially when it comes on disconnections, power losses etc.
Ok @quarcko, I try it.
IT WORKS!!!! My HA was freezing every 6 hours and 2 min, after entering the changes it is going smooth like silk. Here are the logs I suppose to freeze HA.
2019-10-13 07:14:56 ERROR (Thread-2) [custom_components.miio_gateway] Received unknown method: _sync.getUserSceneInfo 2019-10-13 01:14:18 ERROR (Thread-2) [custom_components.miio_gateway] Received unknown method: _sync.getUserSceneInfo
Thanks.
@s4-crashpc - I'm glad to hear that it works :) @cadavre - Relax man, take your time to do serious business, this is a great library, and some people here are to help make it better, pozdrawiam! :))
Thanks to @quarcko we've found error: it was caused by fools bug – logging wrong variable. :/
Thanks to this I also got motivated to take a second look at UDP socket connection we're making in this component. I reviewed "how it's done" in another components and it looked quite the same, but was missing few assumptions.
Now we handle additional socket.error
exception, but as UDP is "non-bullet-proof" protocol – we don't need to destroy and recreate socket. We just need to pass it to another iteration.
Now "ping" is very important process, because it follows gateway availability.
Ping is made every 1min and 3x failed ping will cause gateway to become unavailable. Single success ping will make gateway available again.
PS Also HASS shutting down is now more bullet-proof on gateway.
Great to see that issues were resolved. It's time to update my local branch and re-enable the component? :-)
Give it a try. :)
@cadavre - Wow, seems like a lots of refactoring happened here, nice :) Yes for me there were times when there was caught socket error - mainly when i restarted the router and hassio was losing connection to outside world completely.
Will give it a try for sure. Thanks man!
Now proper 1.2 is available. It was missing 1 commit.
BTW I'm working also on alarm_control_panel
– I want to make it standalone and usable as internal component.
@s4-crashpc @xefil please git it a try now and let me know in few days if we can finally close this issue. :)
Good morning guys!
Just git pull
ed the repo, I'll let you know soon!
Thanks!
Simon
After updating the version, no more HA freezes. it seems that everything is working properly. Thank you so much for your help and for your time.
Hello @cadavre ! I've a little / big issue I don't know if it's related to your component or not. HA hangs and the last entries are those:
Or there is a way to debug it better? I've 0.97.2 of HA on Synology docker and latest miio_gateway component. After restart it works for a day, then it stopps.
Ideas?
Thanks! Simon