cadavre / miio_gateway

lumi.gateway.mieu01 with custom miio_client integration for HA
54 stars 22 forks source link

[custom_components.miio_gateway] Gateway became unavailable by timeout! #2

Closed xefil closed 4 years ago

xefil commented 5 years ago

Hello @cadavre ! I've a little / big issue I don't know if it's related to your component or not. HA hangs and the last entries are those:

2019-08-14 21:01:03 INFO (MainThread) [homeassistant.components.discovery] Unknown service discovered: spotify_connect {'host': '192.168.3.106', 'port': 41397, 'hostname': 'c5695c2f2114ac90-0.local.', 'properties': {'CPath': '/zc/0', 'VERSION': '1.0', 'Stack': 'SP'}}
2019-08-14 21:02:28 INFO (MainThread) [custom_components.miio_gateway] Gateway became unavailable by timeout!
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity Gateway Player: idle>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity Gateway Alarm: armed_away>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity Gateway Illuminance Sensor: 70>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity None: off>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity None: off>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity None: off>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity None: off>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity None: off>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'
2019-08-14 21:02:28 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <bound method XiaomiGwDevice._push_data of <Entity Gateway LED: off>>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
TypeError: _push_data() missing 3 required positional arguments: 'sid', 'event', and 'params'

Or there is a way to debug it better? I've 0.97.2 of HA on Synology docker and latest miio_gateway component. After restart it works for a day, then it stopps.

Ideas?

Thanks! Simon

cadavre commented 5 years ago

Hi @xefil ! Summer time, holiday time. :)

Looks like UDP connection problems between your HA and gateway. I already faced this kind of problem when was using test local connection at the same time. Is anything else than HA connecting your gateway?

xefil commented 5 years ago

Hi @cadavre ! good for you, I'll need to wait end of oktober for holidays :-D Actually there is nothing other that Ha that's connecting to the Xiaomi gw, but it's possible I've some Wifi issues (now should be resolved with new access point). btw, could this issue cause a freeze of HA itself? Actually the component is disabled and the problem is gone. Yep, it's true in the meanwhile I've replaced the WiFi AP, but it's strange that connection issues could cause this problem. Or the component remains in waiting hanging all other processes? Thanks a lot! Simon

cadavre commented 5 years ago

I've added some precautions to prevent errors in case of connection timeouts (that is what's happening in your case), but it looks like it's not enough.

WiFi AP may be causing this issues for sure. Especially if problems started to show up after replacement.

I'll try to also take a look into the timeouts handling and reconnection handling.

xefil commented 5 years ago

Well, I've changed two aspects, AP and diabled the component, this doesn't help eheh Should I git pull or wait ? :-) Thanks!!

cadavre commented 5 years ago

I'll try to work on it in the evening. Will let here know.

xefil commented 5 years ago

Good, thank you!!

darksider4all commented 5 years ago

Hey guys, and good evening. I can confirm this problem as well.

cadavre commented 5 years ago

Hey here! I reviewed the issues #2 and #3 – both looks the same. I fixed the bug, but I'm still working on working re-connection. I will need to test it in practice. As soon as I get it done – I'll update repository!

xefil commented 5 years ago

Thanks @cadavre I'll wait for news. I've re-enabled the component a week ago and same behaviour happened: timeout and HA freezes totally. Restart required. Cheers!

darksider4all commented 5 years ago

@cadavre well until update, I'll just not use the gateway, good thing I've bought the zigbee stick.

quarcko commented 5 years ago

Same here, HA freezes when "unavailable by timeout" message appears.

cadavre commented 5 years ago

Hi guys, I just pushed changes to fix this issue:

It gives this in logs right now :

2019-09-17 09:52:34 INFO (MainThread) [custom_components.miio_gateway] Gateway became unavailable by timeout!
2019-09-17 09:54:40 INFO (Thread-2) [custom_components.miio_gateway] Gateway became available!

I tested by unplugging the gateway, waited to timeout, checked the interfaces:

Zrzut ekranu 2019-09-17 o 09 54 35

"niedostępny" [PL] means "unavailable"

Then plugged the gateway in and waited for next ping.

Additionally I changed ping and unavailability timers from 5 and 11 minutes (respectively) to 1 and 3 minutes. HA will get availability info more often now.

cadavre commented 5 years ago

Let me know if it fixes the bug of HA becoming unavailable and if your gateways re-connect after connection was dropped. I guess I'll need to add few more lines of code for re-connecting after hard reset.

darksider4all commented 5 years ago

hate to be the bearer of bad news, but for it's still unstable.

cadavre commented 5 years ago

Same error cannot pop up. Could you be more specific? :)

s4-crashpc commented 5 years ago

Hi all, this is my log file when hassio freeze; 2019-09-22 20:09:46 DEBUG (Thread-3) [custom_components.miio_gateway] Received event: lumi.gateway.mieu01 miio.gateway - event.heartbeat 2019-09-22 20:09:56 DEBUG (Thread-3) [custom_components.miio_gateway] Received data: 2019-09-22 20:09:56 DEBUG (Thread-3) [custom_components.miio_gateway] b'{"id":2607,"method":"_sync.getUserSceneInfo","params":[{"did":"264429336","model":"lumi.gateway.mieu01","num":4,"scene":[{"id":474368496,"time":1566650248},{"id":474368497,"time":1566650248},{"id":474368499,"time":1566650248},{"id":474368500,"time":1566650248}]}]}\x00' 2019-09-22 20:09:58 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 1 2019-09-22 20:10:59 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 2 2019-09-22 20:12:00 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 3 2019-09-22 20:13:01 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 4 2019-09-22 20:14:01 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 5 2019-09-22 20:15:02 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 6 2019-09-22 20:16:03 DEBUG (MainThread) [custom_components.miio_gateway] Ping N: 7

I add few line in init.py ; @callback def _async_set_unavailable(self, now): """Set state to UNAVAILABLE.""" if self._ping_loss > 25: _LOGGER.info("Gateway became unavailable by timeout!") self._is_available = False self._ping_loss = 0 for func in self.callbacks: func(None, None, EVENT_AVAILABILITY)

@callback
def _async_send_ping(self, now):
    """Send ping to gateway."""
    self._ping_loss = self._ping_loss + 1
    _LOGGER.debug("Ping N: " + str(self._ping_loss))
    self.send_to_hub({"method": "internal.PING"})

to see when the xiaomi gateway stop to respond (or the script don't recive data from it).

sorry for my english, I hope these data will help you understand the problem. thank you for what you do for us.

cadavre commented 5 years ago

So: 1) HA didn't freezed? 2) Devices went unavailable?

What you can see here is that your gateway is not responding to ping. Hardly anything I can do about. Have you done anything to your gateway recently? I'm using mine since I've published the component and it works fine. Having problems with connection only when there was power breakdown.

I need you to remind that if any external connection is being made – gateway closes connection that were already opened. Ie. if you send a command via another UDP socket. Is you network open to world?

s4-crashpc commented 5 years ago

Maybe I discovered the mystery, The problem presents itself those times when HUB xiaomi disconnects from WIFI. But the reason is not absolutely known to me. When no longer responding, HA stops completely and I have to restart it. To answer your questions;

N 1: HA freezes completely. N 2: no device connected on HA responds.

No, I didn't make any changes to the xiaomi hub beyond the guide. In your opinion, what could be the cause? Have you set a static IP on your HUB? If yes how?

Thank you for your time.

quarcko commented 5 years ago

For me this hang of HA usually happens not when you unplug / replug gateway from power. This kind of disconnect usually recovers normally. But try to restart your router instead - so both gateway and HA are losing connection, and then upon router going back online - HA stops responding... (maybe when it tries to ping the gateway after connection to router is restored?)

I even tried to add some tweaks into code that are watching for socket errors and try to re-create the socket if it goes south, but sometimes HA freezes anyway, although it seems more stable for me... maybe just placebo.

I could make a PR some time later with those mods.

s4-crashpc commented 5 years ago

I too have tried to modify the code by inserting lines for reconnecting the socket when the pings are not answered exceed 5, but nothing to do, after the 12 ping unanswered HA stops completely, without giving any kind of error. Since I started using HUB, it has never started working again, I have a router with openwrt, I reset and updated the router thinking that was the problem but it is not. My network seems to be working properly. When you can publish the placebo and I will test it too.

quarcko commented 5 years ago

My init.py file is old and without current fixes from cadavre, so need to merge them somehow

quarcko commented 5 years ago

ok, merged everything, just clone from my forked repo.

s4-crashpc commented 5 years ago

ok, merged everything, just clone from my forked repo.

nothing to do, the problem remains, never goes wrong socket. I am trying to write a program to see what is sent by HUB without going through HA and monitoring the results. If not a problem, I can think that the "asincyo" function does not generate the desired results.

quarcko commented 5 years ago

So any news on this issue from anybody? Last log i captured during which HA has frozen:

2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] Sending data: 2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] b'{"method": "internal.PING"}' 2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] Received data: 2019-10-05 08:12:50 DEBUG (Thread-2) [custom_components.miio_gateway] b'{"method": "internal.PONG", "result": ["online"]}\x00' 2019-10-05 08:13:10 DEBUG (Thread-2) [custom_components.miio_gateway] Received data: 2019-10-05 08:13:10 DEBUG (Thread-2) [custom_components.miio_gateway] b'{"id":6615,"method":"event.heartbeat","params":[{"w_SL":"","w_LQ":"","TIME":"719419","FRAM":"213647360"}]}{"id":12345,"method":"local.query_time"}{"id":6616,"method":"event.heartbeat","params":[{"w_SL":"","w_LQ":"","TIME":"719419","FRAM":"213647360"}]}{"id":6617,"method":"_sync.neighborDevInfo","params":{"did":[]}}' 2019-10-05 08:13:10 DEBUG (Thread-2) [custom_components.miio_gateway] Received event: lumi.gateway.mieu01 miio.gateway - event.heartbeat 2019-10-05 08:17:36 INFO (MainThread) [custom_components.miio_gateway] Gateway became unavailable by timeout!

Very strange one this time: You see there is a PING-PONG message is working normally, then there is a "heartbeat" standart thing gateway sends every few minutes And then goes interesting part: There is complete silence for 4:26 minutes, there are not even "PING" messages or anything, and suddenly "unavailable by timeout".

Any fresh thoughts?

quarcko commented 5 years ago

I think i found someting!!! look at there 2 lines:

Line 251: print("Received unknown method: " + str(data)) Line 260: print("Non-parseable data: " + str(data))

First of all - they should be _LOGGER, but that just minor bug of this not being printer. But major thing - they try to print "data" - in this method there is no such variable and script crashes!!

That exactly what happened to me in the post above i made. there was a requested method: {"id":12345,"method":"local.query_time"}

This method is unknown to library, so it tried to "print" it and BOOM!

quarcko commented 5 years ago

My repo with fix is ready. s4-crashpc - your gateway seems to cause very consistent hangs - can you give it a spin?

cadavre commented 5 years ago

Good find @quarcko ! I'm starting new corporate project since last month and it's extremely hard for me to make any additional work. But I will. With miio_gateway and dafang_camera because I'm also facing some issues there. Especially when it comes on disconnections, power losses etc.

s4-crashpc commented 5 years ago

Ok @quarcko, I try it.

s4-crashpc commented 5 years ago

IT WORKS!!!! My HA was freezing every 6 hours and 2 min, after entering the changes it is going smooth like silk. Here are the logs I suppose to freeze HA.

2019-10-13 07:14:56 ERROR (Thread-2) [custom_components.miio_gateway] Received unknown method: _sync.getUserSceneInfo 2019-10-13 01:14:18 ERROR (Thread-2) [custom_components.miio_gateway] Received unknown method: _sync.getUserSceneInfo

Thanks.

quarcko commented 5 years ago

@s4-crashpc - I'm glad to hear that it works :) @cadavre - Relax man, take your time to do serious business, this is a great library, and some people here are to help make it better, pozdrawiam! :))

cadavre commented 5 years ago

Thanks to @quarcko we've found error: it was caused by fools bug – logging wrong variable. :/

Thanks to this I also got motivated to take a second look at UDP socket connection we're making in this component. I reviewed "how it's done" in another components and it looked quite the same, but was missing few assumptions.

Now we handle additional socket.error exception, but as UDP is "non-bullet-proof" protocol – we don't need to destroy and recreate socket. We just need to pass it to another iteration.

Now "ping" is very important process, because it follows gateway availability.

Ping is made every 1min and 3x failed ping will cause gateway to become unavailable. Single success ping will make gateway available again.

PS Also HASS shutting down is now more bullet-proof on gateway.

xefil commented 5 years ago

Great to see that issues were resolved. It's time to update my local branch and re-enable the component? :-)

cadavre commented 5 years ago

Give it a try. :)

quarcko commented 5 years ago

@cadavre - Wow, seems like a lots of refactoring happened here, nice :) Yes for me there were times when there was caught socket error - mainly when i restarted the router and hassio was losing connection to outside world completely.

Will give it a try for sure. Thanks man!

cadavre commented 5 years ago

Now proper 1.2 is available. It was missing 1 commit.

BTW I'm working also on alarm_control_panel – I want to make it standalone and usable as internal component.

@s4-crashpc @xefil please git it a try now and let me know in few days if we can finally close this issue. :)

xefil commented 5 years ago

Good morning guys! Just git pulled the repo, I'll let you know soon! Thanks! Simon

s4-crashpc commented 5 years ago

After updating the version, no more HA freezes. it seems that everything is working properly. Thank you so much for your help and for your time.