Closed Mixiaoxiao closed 3 years ago
@Mixiaoxiao thanks for the detailed analysis. Rightly said, the below snippet indeed seems wrong and a root cause for the issues reported.
for (i = 0; i < HAP_MAX_SESSIONS; i++) {
if (!hap_priv.sessions[i])
continue;
if (hap_priv.sessions[i]->ctrl == ctrl) {
httpd_sess_trigger_close(hap_priv.server, hap_priv.sessions[i]->conn_identifier);
}
}
I have verified with your fix and it works as intended.
The changes in the check for return value of read_fn() may not be required though, since the code is normally supposed to reach here, only when the select call in http_server returns and the particular socket descriptor is ready to be read. Even if we return 0 from here, the http server code will close the session assuming that the peer has closed the connection.
Do you plan to raise a PR for these fixes, or should I add them myself? In my code, I have renamed hap_close_ctrl_sessions
to hap_ctrl_close_all_sessions
and added a new hap_ctrl_close_single_session
as per the code you shared. I will also add a config option for keepalive, which will be enabled automatically for non MFi HomeKit applications.
Meanwhile, thanks for your efforts for the Arduino support :)
Thanks for your quick replay. It is ok to modify by yourself, following the sdk's code style. I consider it will be nice to make a comment in your modification with a link of this issuse analysis, if it is possible.
Also, I sugest the function names:
hap_close_ctrl_sessions (ctrl)
to hap_close_sessions_of_ctrl (ctrl)
hap_close_session (session)
to close a certain session (irrelevant with ctrl
)hap_decrypt_error
may be not suitable here, since it is caused by a connection lost in most casesHopefully this helps someone in the future, but I was receiving this error after assigning a value that was out of range for a characteristic that can only accept values of 0-100. I was using the provided hap_char_hue_create
fxn to set the lightbulb hue but didn't see that within that fxn there's a hap_char_float_set_constraints
fxn that shows that this specific characteristic has a lower bound of 0 and an upper of 360. It would be nice to be warned when passing a value that falls outside of characteristic ranges but it was pretty obvious after a minute of following the function calls.
@Mixiaoxiao , the specific fix and the keep alive config option have now been added. Thanks for debugging and the finding root cause of the issue.
@Mixiaoxiao thanks for looking into this! After rebooting my router 700 times, I managed to solve the issue by assigning a static IP address to my ESP32. I've removed the static assignment to test this fix and it seems to be working. I'll report if I notice a crash!
Off topic but I just wanted to add that I'm also working on an Arduino version of this SDK and it's currently available via PlatformIO. 😄
Off topic but I just wanted to add that I'm also working on an Arduino version of this SDK and it's currently available via PlatformIO. 😄
@Brawrdon that's good to know. Thanks for your efforts :)
I am the author of Mixiaoxiao/Arduino-HomeKit-ESP8266.
This official HomeKit sdk I am using currently to build my newer projects, and I have made an Arduino library version of this sdk.
But I also got the issue same as https://github.com/espressif/esp-homekit-sdk/issues/14.
Named "Decryption error/Connection lost on ESP32 with multiple Apple devices running Home".
My issue is as follows:
The problem is that, when my iPhone is disconnected (or just WiFi is not active), my iPad's(same iCloud ID with iPhone) connection will be closed too. This behavior is NOT as expected. Same as Brawrdon's comment: https://github.com/espressif/esp-homekit-sdk/issues/14#issuecomment-737817149
I figure out the related codes as follows:
Consider that, in int hap_decrypt_data(...)
read_fn
here indeed calls thehap_httpd_raw_recv
, that is therecv
(lwip_recv
), try receiving data from tcp connection. iflen
is-1
, means the tcp-socket is closed; iflen
is0
, means nothing received, that is reading is timeout or the peer's WiFi is not active (in sleep mode) now; iflen
is1
, means we only received 1 byte during a receiving-timeout time (SO_RCVTIMEO=0x1006=4102, in seconds, maybe?), here we can do consider that we really got theDecryption error/Connection lost
. According to the protocol of HomeKit, here, at least 2 bytes should be received.So, here, should be modified to
Then, in
hap_close_ctrl_sessions
, all sessions with samectrl
(session->ctrl) will be closed. In my debug, both my iPhone and iPad are of samectrl
. All iOS devices with same iCloud ID will have a samectrl
. I am not sure, but it should be. Here, the problem come out. Only one device disconnected, but why close all sessions with thisctrl
???So, here, should be modified to
Another thing is that, in WiFi connection with other devices, it is really difficult to exactly-know the connection is lost or not. The peer's WiFi can be in sleep mode(but the connection in lwip can NOT know that), and it' WiFi will response nothing (lwip will always receive 0 byte). So I think the
keepalive
of tcp is a good method to do this. When the peer's WiFi is in sleep mode for a long time, the lwip can know that and just drop it. This behavior is really meaningful for ESP8266 with small memory(ram), it can quickly release the tcp buffer memory when the peer is no longer active. And thekeepalive
option has been implemented both in maximkulkin/esp-homekit/server.c#L3705 and my Mixiaoxiao/Arduino-HomeKit-ESP8266/arduino_homekit_server.cpp#L3006.I do know the
keepalive
is not allowed in HomeKit Protocol. HAP Non-Commercial Version section 6.2.3: "HAP accessory servers must not use keepalive messages, which periodically wake up iOS devices". But I do consider it is ok to make a config to enablekeepalive
for hobby projects.I have just create a bugfix-commit including all above-mentioned modifications for my Arduino-HomeKit-ESP. Tested for 24 hours and works fine.