Open phybros opened 10 months ago
It sounds like pairing worked once and then you reset the esp but not the home assistant integration. So the existing pairing in ha is trying to establish a connection, but now you are trying to pair it again. So the device sees the results of 2 concurrent tasks - one trying to use the previous (stale) pairing, and one trying to make a new one.
That would make sense but I also tested it after removing the HA integration and even restarting HA, along with totally wiping the ESP and starting "from scratch" with completely different device IDs etc. Even when doing that process multiple times it still produces these random results.
The "already paired" issue is only one of the ways it fails. The other main one is the issue shown here
Again, this can happen after resetting, rebooting, removing the integration and starting over etc.
On Sat, Nov 11, 2023 at 10:56 AM Jc2k @.***> wrote:
It sounds like pairing worked once and then you reset the esp but not the home assistant integration. So the existing pairing in ha is trying to establish a connection, but now you are trying to pair it again. So the device sees the results of 2 concurrent tasks - one trying to use the previous (stale) pairing, and one trying to make a new one.
— Reply to this email directly, view it on GitHub https://github.com/Jc2k/aiohomekit/issues/343#issuecomment-1806850844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPEU3FYQN7S3F2O3CX3FM3YD6NZJAVCNFSM6AAAAAA7HKZIDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWHA2TAOBUGQ . You are receiving this because you authored the thread.Message ID: @.***>
Incidentally I did also try it on a completely different instance of HA on a different network at a different house and got the same issue the very first time trying to pair it.
Just to be sure, what are the steps you would suggest going through to get the HA integration to stop trying to contact the device after removing it from the integrations page?
The pair verify code path only happens when there is a pairing. It loads secrets from a config entry, and there is no config entry until pairing has happened.
It would probably fill your logs with exceptions if it tried.
That should mean that it'd be impossible to see pair verifys from ha if you removed the integration and then restarted ha to make sure it was gone. The restart shouldn't be needed but it's possible you've discovered an edge case where a faulty pairing can't be cleanly unloaded.
We know there are ~28k installs of this integration, and this is the first time I can remember this behaviour coming up, and you can recreate it in 2 distinct environments.
That makes me wonder if there is an environmental variable (eg we have had trouble with mdns repeaters in the past, and that's how paired/unpaired status is relayed, rather than via the api). We might want to examine logs from the zeroconf integration to make sure it's getting the right data there.
Duplicate device ids could also be a problem and cause this kind of confusion. Hopefully they are automated by homespan and fully random so I wouldn't expect that to be a problem.
Thanks for all this info. I also suspected network weirdness, which is why I tried it somewhere else with a super normal wifi/networking situation. And now that I'm thinking back maybe it did work fine the very first time in that environment and then had issues subsequent times.... hmm. I'll go away and try to do a more scientific set of tests and let you know!
I tried just un-pairing and re-pairing from within HA a few times. When unpairing, it all works well, the HomeSpan devices sets itself back to unpaired etc, the integration disappears from HA. The device is immediately re-discovered by HA and I can re-pair no problem (MOST of the time), however it does put two things into the HA logs:
Logger: homeassistant.components.homekit_controller.connection
Source: components/homekit_controller/connection.py:782
Integration: HomeKit Device (documentation, issues)
First occurred: 2:22:31 PM (1 occurrences)
Last logged: 2:22:31 PM
HomeKit device update skipped as previous poll still in flight: CF:97:67:3A:BC:A6
and
Logger: aiohomekit.utils
Source: /usr/local/lib/python3.11/site-packages/aiohomekit/utils.py:45
First occurred: 2:22:35 PM (1 occurrences)
Last logged: 2:22:35 PM
Failure running background task: Task-4125
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/aiohomekit/utils.py", line 40, in _handle_task_result
task.result()
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/connection.py", line 590, in _reconnect
return await self._connect_once()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/connection.py", line 698, in _connect_once
request, expected = state_machine.send(response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/aiohomekit/protocol/__init__.py", line 596, in get_session_keys
handle_state_step(response_tlv, TLV.M4)
File "/usr/local/lib/python3.11/site-packages/aiohomekit/protocol/__init__.py", line 90, in handle_state_step
error_handler(tlv_dict[TLV.kTLVType_Error], f"step {expected_state}")
File "/usr/local/lib/python3.11/site-packages/aiohomekit/protocol/__init__.py", line 64, in error_handler
raise AuthenticationError(stage)
aiohomekit.exceptions.AuthenticationError: step bytearray(b'\x04')
But as I said, the device is configured fine, and everything works as it should MOST of the time. Occasionally it will give the step bytearray(b'\x04')
but if I try again 2 or 3 more times, getting that error in the UI each time, it eventually works
One intersting thing I noticed while looking at HomeSpan's debug logs is that even after I un-pair the device, and the integration vanishes from HA, the device is still receiving HTTP requests periodically like this:
2023-11-11 14:34:04 --> ** Client #12 Connected: (87 sec) 192.168.86.201 on Socket 3/16
2023-11-11 14:34:04 -->
2023-11-11 14:34:04 --> <<<<<<<<< 192.168.86.201 <<<<<<<<<
2023-11-11 14:34:04 --> POST /pair-verify HTTP/1.1
2023-11-11 14:34:04 --> Host: 192.168.86.46
2023-11-11 14:34:04 --> Content-Length: 37
2023-11-11 14:34:04 --> Content-Type: application/pairing+tlv8
2023-11-11 14:34:04 --> ------------ END BODY! ------------
2023-11-11 14:34:04 --> STATE(1) 01
2023-11-11 14:34:04 --> PUBKEY(32) 7D278B1D4A959D2C111F91B1D4ABB04220BCF7CFA38E29A0B14F16BA70DBBE59
2023-11-11 14:34:04 --> -------
2023-11-11 14:34:04 --> ----- END TLVS! ------------
2023-11-11 14:34:04 --> In Pair Verify #12 (192.168.86.201)...
2023-11-11 14:34:04 --> *** ERROR: Device not yet paired!
2023-11-11 14:34:04 -->
2023-11-11 14:34:04 -->
2023-11-11 14:34:04 --> >>>>>>>>>> 192.168.86.201 >>>>>>>>>>
2023-11-11 14:34:04 --> HTTP/1.1 200 OK
2023-11-11 14:34:04 --> Content-Type: application/pairing+tlv8
2023-11-11 14:34:04 --> Content-Length: 6
2023-11-11 14:34:04 -->
2023-11-11 14:34:04 --> STATE(1) 02
2023-11-11 14:34:04 --> ERROR(1) 01
2023-11-11 14:34:04 --> ------------ SENT! --------------
Almost like HA is still doing things in the background even though the integration is gone. This still happens after I reboot the ESP device also. It wakes up and starts getting these requests. The IP ending 201
is my home assistant server BTW.
If I restart HA in this situation, the HTTP requests stop, and they don't start again once HA is back up.
After HA is back up, I try to pair and almost always get the "device is already paired", and I can see in HomeSpan's logs that HA is requesting to pair, pairing, and then requesting to pair a second time, resulting in the error.
The second set of tests I did also involved wiping the ESP device and having it re-generate a new unique device ID. Each time I did this, I also restarted my HA instance to get rid of any cached zeroconf discovery things. This seemed most successful when I wiped the ESP and also powered it totally off during the HA restart. Even then, sometimes there was a device "detected" by HA. Definitely seems like some mdns/zeroconf things maybe?
A couple of times I got this, if that helps.
Logger: aiohomekit.utils
Source: /usr/local/lib/python3.11/site-packages/aiohomekit/utils.py:45
First occurred: 2:16:39 PM (2 occurrences)
Last logged: 2:18:15 PM
Failure running background task: Task-2638
Failure running background task: Task-3279
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/pairing.py", line 140, in _ensure_connected
await connection.ensure_connection()
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/connection.py", line 313, in ensure_connection
await asyncio.shield(self._connector)
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/pairing.py", line 139, in _ensure_connected
async with asyncio_timeout(10):
File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in __aexit__
raise TimeoutError from exc_val
TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/aiohomekit/utils.py", line 40, in _handle_task_result
task.result()
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/pairing.py", line 421, in _process_config_changed
await self.list_accessories_and_characteristics()
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/pairing.py", line 177, in list_accessories_and_characteristics
await self._ensure_connected()
File "/usr/local/lib/python3.11/site-packages/aiohomekit/controller/ip/pairing.py", line 146, in _ensure_connected
raise AccessoryDisconnectedError(
aiohomekit.exceptions.AccessoryDisconnectedError: Timeout while waiting for connection to device 192.168.86.46:80
It looks like HA isn't fully shutting down the connection when its unpaired.
Is there anything I can do to confirm that @bdraco ?
If it's still doing pair verify after removing the integration and stopping after a restart then it's pretty much confirmed.
It's probably not been a problem before because most users add it once and don't need to remove it and add it again.
OK should I report this to HA then? What's the next thing to do for this issue?
No reporting it there would just ping the same people that you are speaking to here :) Just need to wait for someone to have time to look into it.
Haha ok no problem! Thanks for all your responses
Hey there! I have definitively confirmed that this can happen if the device is brand new and has never been paired with HA. Also this was on a completely fresh HA that has never paired any devices before. So it seems like it's not just pair-unpair scenario from above.
To be clear I think that HA DOES continue to send messages to the device after unpairing it (sometimes), but it can ALSO fail on totally fresh pairing. I think they are related but separate bugs. The logs from the "fresh" scenario are identical to the ones I provided already - seems like multiple threads both trying to pair at the same time for some reason.
I've never had to debug anything this deep in HA before, but I'm going to try the dev env set up locally and see if I can do anything.
HomeSpan is a library for ESP32 devices to enable them to speak HomeKit. After a huge amount of tests, it seems that when pairing these devices with Home Assistant via the HomeKit Device integration, pairing intermittently fails in a variety of ways.
04
I've included a massive amount of logs in my original HomeSpan bug report, but the description of what's happening from the devs makes it sound like there's some sort of race condition in aiohomekit which is causing the pairing steps to happen in a random order (nondeterministically).
The HomeSpan issue link is here: https://github.com/HomeSpan/HomeSpan/issues/680
but the important part I think is here:
Could this be a bug in aiohomekit? Or maybe the Home Assistant integration itself? HomeSpan devices always pair flawlessly with "real" HomeKit controllers (e.g. via the Home app) and claim to be fully compliant with the spec, so I'm not sure what's going on