anotherjulien / MyHOME

MyHOME integration for Home-Assistant
GNU Affero General Public License v3.0
137 stars 57 forks source link

Command session delay #19

Closed superinj closed 3 years ago

superinj commented 3 years ago

From 0.8.0 release I got a random delay when an action is sent though command session. If I turn a light on, sometimes it is really fast and it turn light ON immediately like a real switch, but sometimes it get 2/3 second delay to turn the light on. After this first command is sent, the following commands are fast as normal, so I suppose that the command session stop working after a timeout and I get those 2/3 seconds to restart.

The integration is configured with 5 concurrent command sessions as default, but I suppose that are all off when I get this delay.

This issue happen, both with lovelace switch and automation.

Depechie commented 3 years ago

Must say I see something similar. More than often since v0.8 the first action in HomeKit will not work. Just retriggering it fixes it. After that other commands work fine.

I notice it mostly when I was away for work during the day and coming home, doing the first action will fail.

anotherjulien commented 3 years ago

Would you be able to provide debug logs to illustrate? This would allow us to see when a command was queued, when it was unqueued by a worker, and when the command was actually sent (and if indeed there has been a re-negotiation in-between)

superinj commented 3 years ago

Sure, here it is:

2021-09-14 00:19:51 DEBUG (MainThread) [custom_components.myhome] Message *1*1*12## was successfully queued.
2021-09-14 00:19:51 DEBUG (MainThread) [custom_components.myhome] Message *#18*72#0*51## was successfully sent.
2021-09-14 00:19:51 DEBUG (MainThread) [custom_components.myhome] Message *#18*74#0*51## was successfully unqueued by worker 1.
2021-09-14 00:19:52 DEBUG (MainThread) [custom_components.myhome] Received: *#18*74#0*113*0##
2021-09-14 00:19:52 INFO (MainThread) [custom_components.myhome] Sensor 4 is reporting an active power draw of 0 W.
2021-09-14 00:19:53 DEBUG (MainThread) [custom_components.myhome] Received: *#18*73#0*113*0##
2021-09-14 00:19:53 INFO (MainThread) [custom_components.myhome] Sensor 3 is reporting an active power draw of 0 W.
2021-09-14 00:19:55 DEBUG (MainThread) [custom_components.myhome] Message *#18*73#0*53## was successfully sent.
2021-09-14 00:19:55 DEBUG (MainThread) [custom_components.myhome] Message *#18*75#0*53## was successfully unqueued by worker 0.
2021-09-14 00:19:56 DEBUG (MainThread) [custom_components.myhome] Received: *#18*75#0*113*142##
2021-09-14 00:19:56 INFO (MainThread) [custom_components.myhome] Sensor 5 is reporting an active power draw of 142 W.
2021-09-14 00:19:56 DEBUG (MainThread) [custom_components.myhome] Message *#18*73#0*51## was successfully sent.
2021-09-14 00:19:56 DEBUG (MainThread) [custom_components.myhome] Message *#18*75#0*51## was successfully unqueued by worker 2.
2021-09-14 00:19:57 DEBUG (MainThread) [custom_components.myhome] Message *#18*74#0*54## was successfully sent.
2021-09-14 00:19:57 DEBUG (MainThread) [custom_components.myhome] Message *1*1*12## was successfully unqueued by worker 3.
2021-09-14 00:19:57 INFO (MainThread) [custom_components.myhome] Message *1*1*12## was successfully sent.
2021-09-14 00:19:57 DEBUG (MainThread) [custom_components.myhome] Received: *1*1*12##
2021-09-14 00:19:57 INFO (MainThread) [custom_components.myhome] Light 12 is switched on.
anotherjulien commented 3 years ago

Huh, this is strange, it seems that the message spent 6s in the queue without being picked up.

Can I ask:

Given that the queue pickup is handled asynchronously, I would tend to think there would be some kind of ressource shortage. But on the other hand, I don't understand why things would go back to normal after a single "slow" command...

superinj commented 3 years ago

What kind of gateway do you have?

MyHomeServer1

What kind of HA install are you running? (on what kind of hardware?)

I have Home Assistant OS 6.3, with the last update of the core. The hardware is an Intel NUC i5. Normally I have this utilization:

Schermata 2021-09-14 alle 23 31 40

How many devices are handled by the integration?

63 devices with 83 entities.

it's very strange, I know, also because in the past version, It worked like a charm.

anotherjulien commented 3 years ago

OK, so no obvious lack of power here! Is the issue easily reproducible? Would you be able to test with fewer workers? (2 for example) If it still happens, then more workers? (8 or 9 maybe?)

The difference with the previous versions is that now I rely on a shared queue between multiple worker tasks that are all run inside the main HA asyncio event loop. The issue here is that the worker task reading messages from the queue is waiting for the main HA loop to give it "execution time".

Previously each message was sent to the gateway as soon as it was generated. This became an issue when sending a lot of messages at once, since a new network connection had to be built (and tear down) each time, and with enough messages you would start to have a lot of simultaneous connections (new connections being built while previous messages were not acknowledged yet). Enough to soft-crash some types of gateways. Now each worker sends one message at a time, and will only pick a new one once it received proper acknowledgement for the previous one.

With only 1 worker it slows down the flow of messages when a lot are sent at once (mostly visible on startup when all entities are updated at once). So multiple workers help, while also maintaining a known maximum of concurrent TCP connections to the gateway.

superinj commented 3 years ago

No, the issue is not easily reproducible because it happen randomly.

I will try with different workers, it seems that from last master it happen less frequently, but probably it's just a coincidence.

The solution you found with a queue of workers it's a great idea, instead of relay on a single command session. Maybe it need only small refine.

Sometimes I noticed that all command sessions reconnect all together, look at this:

2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*74#0*51## was successfully queued.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*75#0*51## was successfully queued.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*51*51## was successfully unqueued by worker 0.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Command session connection reset, retrying...
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Opening command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*71#0*51## was successfully unqueued by worker 1.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Command session connection reset, retrying...
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Opening command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*72#0*51## was successfully unqueued by worker 4.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Command session connection reset, retrying...
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Opening command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*73#0*51## was successfully unqueued by worker 2.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Command session connection reset, retrying...
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Opening command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Message *#18*74#0*51## was successfully unqueued by worker 3.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Command session connection reset, retrying...
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Opening command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Negotiating command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Negotiating command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Negotiating command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Negotiating command session.
2021-09-19 12:09:01 DEBUG (MainThread) [custom_components.myhome] Negotiating command session.

Could this help?

Did you try to add a delay between each session start?

superinj commented 3 years ago

I'm not experiencing this issue anymore. I will reopen this ticket if it occurs again. Thank you.