espressif / esp-homekit-sdk

550 stars 99 forks source link

Decryption error/Connection lost on ESP32 with multiple Apple devices running Home #14

Closed Brawrdon closed 3 years ago

Brawrdon commented 3 years ago

I'm getting a similar issue as with issue #4 when trying to run the fan example on my ESP32 DevKit v1. It works for about a minute before the accessory becomes unavailable and it returns an error marking the session as invalid. It also doesn't seem to automatically restart (I'm assuming it did in the linked issue because of how their logs looked).

I've tried the recommendation of adding CONFIG_NEWLIB_NANO_FORMAT=n to the sdkconfig.defaults file with no luck.

######## Starting Pair Verify ########
Pair Verify M1 Received
Pair Verify M2 Successful
Socket fd: 53; HTTP Request POST /pair-verify
Pair Verify M3 Received
HomeKit Session active
Pair Verify Successful for 500190B1-CCD4-4373-ADF5-02FAE4650FCF
Socket fd: 53; HTTP Request GET /accessories
Generating HTTP Response
Value Changed
{"accessories":[{"aid":1,"services":[{"iid":1,"type":"3E","characteristics":[{"iid":2,"type":"14","perms":["pw"],"ev":false,"format":"bool"},{"iid":3,"value":"Espressif","type":"20","perms":["pr"],"ev":false,"format":"string"},{"iid":4,"value":"EspFan01","type":"21","perms":["pr"],"ev":false,"format":"string"},{"iid":5,"value":"Esp-Fan-26FB3F","type":"23","perms":["pr"],"ev":false,"format":"string"},{"iid":6,"value":"001122334455","type":"30","perms":["pr"],"ev":false,"format":"string"},{"iid":7,"value":"0.9.0","type":"52","perms":["pr"],"ev":false,"format":"string"},{"iid":10,"value":"RVNQMzJIQVA=","type":"220","perms":["pr"],"ev":false,"format":"data"}]},{"iid":8,"type":"A2","characteristics":[{"iid":9,"value":"1.1.0","type":"37","perms":["pr"],"ev":false,"format":"string"}]},{"iid":11,"type":"40","characteristics":[{"iid":12,"value":false,"type":"25","perms":["pr","pw","ev"],"ev":false,"format":"bool"},{"iid":13,"value":"My Fan","type":"23","perms":["pr"],"ev":false,"format":"string"},{"iid":14,"value":1,"type":"28","perms":["pr","pw","ev"],"ev":false,"format":"int","minValue":0,"maxValue":1,"minStep":1}]},{"iid":15,"type":"d57034d8-3736-11e8-b467-0ed5f89f718b","hidden":true,"characteristics":[{"iid":16,"type":"d57039e2-3736-11e8-b467-0ed5f89f718b","perms":["pw"],"ev":false,"format":"string","description":"FW Upgrade URL"},{"iid":17,"value":0,"type":"d5703b5e-3736-11e8-b467-0ed5f89f718b","perms":["pr","ev"],"ev":false,"format":"int","description":"FW Upgrade Status"}]}]}]}
Notification Sent
Socket fd: 51; Event message: {"characteristics":[{"aid":1,"iid":14,"value":1}]}
Notification Sent
Socket fd: 52; Event message: {"characteristics":[{"aid":1,"iid":14,"value":1}]}
Socket fd: 52; HTTP Request GET /characteristics?id=1.14,1.12
Generating HTTP Response
I (24843) HAP Fan: Received read from 500190B1-CCD4-4373-ADF5-02FAE4650FCF
Value Changed
I (24853) HAP Fan: Received read from 500190B1-CCD4-4373-ADF5-02FAE4650FCF
{"characteristics":[{"aid":1,"iid":14,"value":0},{"aid":1,"iid":12,"value":false}]}
Notification Sent
Socket fd: 51; Event message: {"characteristics":[{"aid":1,"iid":14,"value":0}]}
Decryption error/Connection lost. Marking session as invalid
HomeKit Session terminated
shahpiyushv commented 3 years ago

@Brawrdon , can you give some more information about your network, specifically with regards to number of iOS devices linked to the same Apple account? I see multiple socket fd numbers in the log (51, 52, 53) and so was wondering which session is getting closed. Please also apply this small patch which would also indicate the session being closed and check again

diff --git a/components/homekit/esp_hap_core/src/esp_hap_network_io.c b/components/homekit/esp_hap_core/src/esp_hap_network_io.c
index a2724ab..42b7d6b 100644
--- a/components/homekit/esp_hap_core/src/esp_hap_network_io.c
+++ b/components/homekit/esp_hap_core/src/esp_hap_network_io.c
@@ -97,8 +97,8 @@ int hap_encrypt_data(hap_encrypt_frame_t *frame, hap_secure_session_t *session,

 int hap_decrypt_error(hap_secure_session_t *session)
 {
-       ESP_MFI_DEBUG(ESP_MFI_DEBUG_INFO, "Decryption error/Connection lost. Marking session as invalid");
        if (session) {
+           ESP_MFI_DEBUG(ESP_MFI_DEBUG_INFO, "Decryption error/Connection lost. Marking session %d as invalid", session->conn_identifier);
                session->state = STATE_INVALID;
         hap_close_ctrl_sessions(session->ctrl);
        }
Brawrdon commented 3 years ago

@shahpiyushv I currently have four which are signed into my account with access to Home. When I pair it with my iPhone it sends: Socket fd: 51; HTTP Request POST /pair-setup and then the same for socket 52, 53 and 54 which I guess is iCloud doing its thing. I'm assuming 51 is the iPhone. The first socket to be marked as invalid is 53 which destroys all four sessions. 51 is marked as invalid again slightly after, it manages to successfully send a GET request just before hand even though it seems all sessions were terminated.

I've attached the full log.

homekit-fan.log

Brawrdon commented 3 years ago

If I disconnect every other device and just have my iPhone connected it appears to work (doesn't go unresponsive after about a minute). It does however, throw the error the minute I reconnect any other device to the network. There's no log of the second device pairing or creating a socket session, just Decryption error/Connection lost. Marking session 51 as invalid.

shahpiyushv commented 3 years ago

@Brawrdon , thanks for the additional information. I will check at my end and see if I can find the root cause. Note that this does not seem related to #4 as that issue was specific to ESP8266, that too for only float values.

Meanwhile, can you try commenting out this line and see what happens?

Brawrdon commented 3 years ago

@shahpiyushv Thanks for the help!

Meanwhile, can you try commenting out this line and see what happens?

Nothing seems to change, the error still occurs when I have multiple devices connected.

Brawrdon commented 3 years ago

I think I discovered the issue and a potential solution! I stumbled upon an issue with a HomeKit library for the ESP8266 (https://github.com/Mixiaoxiao/Arduino-HomeKit-ESP8266/issues/9) that seemed to be identical to mine; the mDNS services were being removed just before the device disconnects. I hacked in the solution proposed on the issue, which was to re-announce the mDNS service every five seconds, by calling hap_mdns_announce(false); in the hap_loop_task function inside of esp_hap_main.c. Every time an event occurred, specifically turning the fan on or off, the mDNS service would announce itself again and stay connected, after leaving it be for a few seconds the error appears again.

I'm assuming by adding a timed function to just call hap_mdns_announce(false); every 5ish seconds (I think you could push that up to 15ish) would solve the problem. I'm currently looking into doing that with another task, I think my understanding of how IDF utilises them is wrong since I can't seem to get it to call every 5 seconds. The only thing with this is that the state number increases quickly, I don't know what the purpose of it is or if this would cause an issue.

I'm not that experience with mDNS or the HomeKit Accessory Protocol so I can't think as to why this is a thing... Though, the owner of the library for the ESP8266 did mention two cases they found for when the iOS device disconnects. https://github.com/Mixiaoxiao/Arduino-HomeKit-ESP8266/issues/9#issuecomment-625093576

shahpiyushv commented 3 years ago

@Brawrdon , since reachability of HomeKit accessories depends on mDNS, which is a UDP based protocol, there is indeed a high chance of accessory seeming unreachable just because of a few missed packets in a noisy environment. The solution you have mentioned could indeed work. However, Apple's Bonjour Conformance Test does not allow unsolicited mDNS announcements. In fact there is a specific test just to ensure that the accessory does not make any mDNS announcements for at least a couple of hours, if there is no query. Of course, the solution could still be fine for hobby projects, but not something we can add ourselves, since we want to keep this SDK in sync with the MFi variant of the SDK.

Meanwhile, the purpose of state number (s#) is to let controllers in the network know that something has changed on the accessory. This triggers a GET request from the controllers or a session re-establishment (in case it was broken for some reason).

Brawrdon commented 3 years ago

@shahpiyushv Thanks for the explanation!

My solution seems to only slightly increase the time until the error occurs, the device lasts about half an hour before the same error shows up. Guess I'll have to leave it up to you to see if a proper solution can be found 😭

AramVartanyan commented 3 years ago

@Brawrdon Do not forget that there is another side of the issue - iOS controller / HomeKit hub (or especially in case there is more than one hub). This side is still in evolving stage. I am pretty sure that some of the answers behind “No response” issues are hidden there.

Brawrdon commented 3 years ago

@Brawrdon Do not forget that there is another side of the issue - iOS controller / HomeKit hub (or especially in case there is more than one hub). This side is still in evolving stage. I am pretty sure that some of the answers behind “No response” issues are hidden there.

Which issues are you referring to exactly?

AramVartanyan commented 3 years ago

As far as I understood “Decryption error/Connection lost” you have mentioned is followed by accessory unavailability or famous “No response” status, that has million reasons to appear. My point is that Apple is constantly changing the way that their system works. Also there are other variables like iOS version, kind of hub, number of controllers connected etc. It is good idea to check your setup, for example, on an older version of iOS device (version 10 to 12) and share your experience. I have checked the Fan example myself and it works fine for me (tested on esp32 and esp8266). However I have troubles pairing from iPhone 11 Pro, but with 6s and 5c - don’t. The iOS controller (11 pro) sporadically sends “0” instead of the pairing information on step M3. (I will share details on this issue.) Apple’s code is not perfect, unfortunately.

Brawrdon commented 3 years ago

@AramVartanyan I think I've pinpointed the cause of the issue to having multiple devices connected to the ESP32 via iCloud. The demos appear to work fine when I turn all the other devices onto aeroplane mode and disconnect them from the network. Three devices are on the latest version of their respective OS whilst one isn't. I'm gonna see if it has something to do with that by disabling the device running Catalina. They may have changed something, like you said, which the Catalina device doesn't understand and so causes a panic! If that doesn't work I'm gonna try take a look at the packets via Wireshark. I've been having an issue sniffing packets from my ESP32 via my Mac but that's an entirely separate problem.

Also might be worth mentioning that I have my iPad enabled as a hub but I don't think that's an issue since the devices have always been on the same network. The main controller is an iPhone XR running iOS 14.3.

The iOS controller (11 pro) sporadically sends “0” instead of the pairing information on step M3.

I'm assuming you used Wireshark to do this or does this info show up on the logs?

Brawrdon commented 3 years ago

I think I've solved it! Tldr: I had to reboot my router... 😖

Basically, whilst trying to setup Wireshark to sniff packets between my ESP32, I noticed that I couldn't access machines in my LAN when connected to certain SSIDs (I have multiple SSIDs on one router to separate my IoT stuff). I have no idea why but certain hosts, such as when trying to access my Raspberry Pi, weren't being forwarded. I assume that included ones to my ESP32, which probably caused my devices to go "well I can't see this host guess I'll shut it off". Simply rebooting my router to fix whatever went astray has stopped the ESP32 from disconnected even with multiple devices connected via iCloud.

I'm gonna close this issue now, thanks for the support guys! Moral of the story, always try turning things off and on again 😅

AramVartanyan commented 3 years ago

You can understand much better the issues if you just activate the debugging messages.