Receiving error: Failed to receive acknowledgement of command, max retries exceeded

mikesalz commented 1 year ago

I installed the new local integration last night. Version 0.5.0. It seems to be working very well. But I noticed this in my logs. Just wanted to share it in case it is something that you can address. Please let me know if I can help test any changes. Thanks!

Logger: custom_components.omnilogic_local.coordinator Source: custom_components/omnilogic_local/coordinator.py:85 Integration: OmniLogic Local (documentation, issues) First occurred: 7:14:07 AM (10 occurrences) Last logged: 7:41:05 AM

Unexpected error fetching OmniLogic data: Failed to receive acknowledgement of command, max retries exceeded Traceback (most recent call last): File "/usr/local/lib/python3.11/asyncio/tasks.py", line 490, in wait_for return fut.result() ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/protocol.py", line 109, in _wait_for_ack message = await self.data_queue.get() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/asyncio/queues.py", line 158, in get await getter asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/protocol.py", line 138, in _ensure_sent await asyncio.wait_for(self._wait_for_ack(message.id), 0.5) File "/usr/local/lib/python3.11/asyncio/tasks.py", line 492, in wait_for raise exceptions.TimeoutError() from exc TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 283, in _async_refresh self.data = await self._async_update_data() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/config/custom_components/omnilogic_local/coordinator.py", line 85, in _async_update_data self.msp_config_xml = await self.omni_api.async_get_config(raw=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/models/util.py", line 40, in wrapper resp_body = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/api.py", line 97, in async_get_config return await self.async_send_message(MessageType.REQUEST_CONFIGURATION, req_body, True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/api.py", line 57, in async_send_message resp = await protocol.send_and_receive(message_type, message) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/protocol.py", line 147, in send_and_receive await self.send_message(msg_type, payload, msg_id) File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/protocol.py", line 160, in send_message await self._ensure_sent(message) File "/usr/local/lib/python3.11/site-packages/pyomnilogic_local/protocol.py", line 144, in _ensure_sent raise OmniTimeoutException("Failed to receive acknowledgement of command, max retries exceeded") from exc pyomnilogic_local.exceptions.OmniTimeoutException: Failed to receive acknowledgement of command, max retries exceeded

cryptk commented 1 year ago

That looks like your omnologic controller failed to respond in time to the request to pull the configuration data, are you by chance on a slightly flaky wireless connection or something?

mikesalz commented 1 year ago

That is a good question. I don't THINK I am on a weak wireless connection. I have an Orbi Mesh system. The Omnilogic unit is pretty far away from the main house, but I have a satellite nearby and I seem to have a strong connection when I go out there and test it. I supposed it is intermittently bad, and I'm just not catching that when I test it. I restarted the router and each satellite, but I am still getting that error repeatedly. Any other thoughts?

cryptk commented 1 year ago

Can you enable debug logging for the integration, replicate the issue, and then upload those logs?

mikesalz commented 1 year ago

Attaching here. home-assistant_2023-06-23T17-45-37.373Z.log

cryptk commented 1 year ago

Unfortunately, it looks like it's just a matter of the Omni not responding to the requests. I can see in the logs that the integration is sending a request to get the MSP config, then not receiving back an "ACK" packet as expected, so it re-sends the request for the MSP config 4 additional times, still not receiving the "ACK" packet, and finally gives up.

I would recommend looking into the antenna placement for the wireless connection on your controller. There isn't much I can do if I send a request, get no response, and re-attempt multiple times and still receive no response.

One thing you may want to try is power cycling your Omni controller, I have seen them get into a somewhat "laggy" state where they occasionally just don't like to respond, and a power cycle can fix that, but chances are, this is just a bad wifi connection (probably on the Omni controller end) based on the logs that I'm seeing.

mikesalz commented 1 year ago

I just power cycled. Thanks for the suggestion. If that doesn't help, I'll try to run an ethernet cable to it. That'll be fun.

Is there any chance that pinging every 5 seconds could be mucking up the controller?

cryptk commented 1 year ago

According to the logs that you uploaded, it appears to be polling every 10 seconds. I wouldn't try and push it any faster if you are already having issues.

For what it's worth, it does appear that it works fine most of the time, but there's a spot in the logs where it fails to update three times in a row, and then another shortly later where it failed to update about 10 or 12 times in a row... And then it goes right back to working fine again

mikesalz commented 1 year ago

OK, I thought the default was 5 seconds. I wasn't suggesting to make it poll more often, so I'll leave it at 10.

It is still happening after my power cycle. And seems to be happening more frequently now. I don't get it. I do think it is on the Omnilogic side rather than the router. I have other devices in HA that I know are connected to that satellite and none of them are getting timeout issues. I guess I need to pick up a really long ethernet cable. In the meantime, do you know of an easy way to disable that particular message so it stops blowing up my logs?

cryptk commented 1 year ago

Unfortunately no, it's correctly logging that it was unable to run the polling cycle. As far as I know, the only way to inform home assistant that the polling cycle failed is to raise an exception.

You could also look into the Wi-Fi antenna placement on the Omni controller and see if it could be relocated to somewhere that gives it a better signal. Especially if the antenna is located inside the Omni case, or on the other side of the case from the Wi-Fi access point, that case can really attenuate the signal quite a bit.

cryptk commented 1 year ago

There is a separate update failed exception that I can raise, that might quiet the logs a little bit... I'll look into it

cryptk commented 1 year ago

@mikesalz I just released 0.5.3 that catches that OmniTimeoutException and re-raises as an UpdateFailed, which should hopefully stop some of the log destruction, can you test that out and let me know?

mikesalz commented 1 year ago

Thanks! I'll try it right now.

cryptk commented 1 year ago

@mikesalz any improvement on the log noise?

mikesalz commented 1 year ago

I guess I'm not entirely sure what to expect with the recent change. This one is a little different, but I'm still getting it a bunch of times. Hopefully all of this goes away when my new long ethernet cable arrives Sunday!

Logger: custom_components.omnilogic_local.coordinator Source: helpers/update_coordinator.py:229 Integration: OmniLogic Local (documentation, issues) First occurred: 3:15:49 PM (31 occurrences) Last logged: 8:08:36 PM

Error fetching OmniLogic data: Error communicating with Omni controller

cryptk commented 1 year ago

The goal with the recent change is just to not have your logs full of gigantic stack traces

mikesalz commented 1 year ago

OK, got it. So I'm still getting just as many errors in the log; they are just less verbose. I appreciate your help!

mmun commented 1 year ago

@cryptk what MSP firmware version are you using? Ever since upgrading from 3.2.2 to 4.8.1 I get a lot of dropped packets when pinging my omnilogic controller, e.g.

64 bytes from 192.168.1.228: icmp_seq=54 ttl=64 time=3.824 ms
64 bytes from 192.168.1.228: icmp_seq=55 ttl=64 time=3.863 ms
64 bytes from 192.168.1.228: icmp_seq=56 ttl=64 time=19.724 ms
64 bytes from 192.168.1.228: icmp_seq=57 ttl=64 time=4.117 ms
64 bytes from 192.168.1.228: icmp_seq=58 ttl=64 time=4.051 ms
64 bytes from 192.168.1.228: icmp_seq=59 ttl=64 time=3.886 ms
Request timeout for icmp_seq 62
Request timeout for icmp_seq 63
Request timeout for icmp_seq 64
Request timeout for icmp_seq 65
Request timeout for icmp_seq 66
Request timeout for icmp_seq 67
64 bytes from 192.168.1.228: icmp_seq=68 ttl=64 time=7.031 ms
64 bytes from 192.168.1.228: icmp_seq=69 ttl=64 time=3.729 ms
64 bytes from 192.168.1.228: icmp_seq=70 ttl=64 time=4.466 ms
64 bytes from 192.168.1.228: icmp_seq=71 ttl=64 time=15.330 ms
64 bytes from 192.168.1.228: icmp_seq=72 ttl=64 time=27.200 ms
64 bytes from 192.168.1.228: icmp_seq=73 ttl=64 time=3.848 ms
64 bytes from 192.168.1.228: icmp_seq=74 ttl=64 time=409.549 ms
Request timeout for icmp_seq 75
64 bytes from 192.168.1.228: icmp_seq=75 ttl=64 time=1016.496 ms
Request timeout for icmp_seq 77
64 bytes from 192.168.1.228: icmp_seq=77 ttl=64 time=1158.683 ms
Request timeout for icmp_seq 79
Request timeout for icmp_seq 80
64 bytes from 192.168.1.228: icmp_seq=79 ttl=64 time=2035.183 ms
64 bytes from 192.168.1.228: icmp_seq=80 ttl=64 time=1034.179 ms
Request timeout for icmp_seq 83
Request timeout for icmp_seq 84
64 bytes from 192.168.1.228: icmp_seq=81 ttl=64 time=4027.471 ms
64 bytes from 192.168.1.228: icmp_seq=82 ttl=64 time=3028.658 ms
64 bytes from 192.168.1.228: icmp_seq=84 ttl=64 time=3112.413 ms
64 bytes from 192.168.1.228: icmp_seq=85 ttl=64 time=2106.394 ms
64 bytes from 192.168.1.228: icmp_seq=86 ttl=64 time=1102.971 ms
64 bytes from 192.168.1.228: icmp_seq=87 ttl=64 time=98.555 ms
64 bytes from 192.168.1.228: icmp_seq=88 ttl=64 time=4.384 ms
64 bytes from 192.168.1.228: icmp_seq=90 ttl=64 time=5.074 ms
64 bytes from 192.168.1.228: icmp_seq=91 ttl=64 time=5.188 ms
64 bytes from 192.168.1.228: icmp_seq=92 ttl=64 time=44.822 ms
64 bytes from 192.168.1.228: icmp_seq=94 ttl=64 time=4.717 ms
64 bytes from 192.168.1.228: icmp_seq=95 ttl=64 time=6.355 ms
Request timeout for icmp_seq 97

I'm on a WiFi connection as well. I thought the signal would be very strong but upon further inspection it's not as good as I thought:

I'll try with an ethernet cable later.

mmun commented 1 year ago

Actually, simply moving the remote as close as I could to the AP was sufficient to resolve this. Zero dropped packets and sub 5ms ping. I anticipate ethernet will work even better.

cryptk commented 1 year ago

@mmun I run mine hardwired, the wifi connections to Omni devices seems to be flakey at best

cryptk commented 10 months ago

I think this can be closed as it was caused by a bad WiFi connection rather than by the integration

cryptk / haomnilogic-local

Receiving error: Failed to receive acknowledgement of command, max retries exceeded #62