Jc2k / aiohomekit

asyncio for homekit
Apache License 2.0
52 stars 20 forks source link

Fix cancellation race in IP controller #309

Closed bdraco closed 1 year ago

bdraco commented 1 year ago

related issue https://github.com/home-assistant/core/issues/93213

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 75.75% and project coverage change: -0.01 :warning:

Comparison is base (5aa1c36) 67.44% compared to head (39a9794) 67.44%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #309 +/- ## ========================================== - Coverage 67.44% 67.44% -0.01% ========================================== Files 72 72 Lines 6817 6829 +12 ========================================== + Hits 4598 4606 +8 - Misses 2219 2223 +4 ``` | [Impacted Files](https://app.codecov.io/gh/Jc2k/aiohomekit/pull/309?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [aiohomekit/controller/ip/pairing.py](https://app.codecov.io/gh/Jc2k/aiohomekit/pull/309?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-YWlvaG9tZWtpdC9jb250cm9sbGVyL2lwL3BhaXJpbmcucHk=) | `81.02% <40.00%> (-0.91%)` | :arrow_down: | | [aiohomekit/controller/ip/connection.py](https://app.codecov.io/gh/Jc2k/aiohomekit/pull/309?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-YWlvaG9tZWtpdC9jb250cm9sbGVyL2lwL2Nvbm5lY3Rpb24ucHk=) | `81.40% <82.14%> (+0.15%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

bdraco commented 1 year ago

I had to tweak it a bit since we await it via ensure_conncted as well and we don't want a timeout to cancel the inner.

This also fixes shutdown having to wait for the timeout because cancellation was ignored in the task

bdraco commented 1 year ago

there are actually two time outs for the exact same time so there is another race here

bdraco commented 1 year ago

nevermind, the other timeout is fine since its wrapping the ensure_connected which goes into the background if it cannot connect. Its just confusing that they are the same

Jc2k commented 1 year ago

That background reconnect thing seemed like a good idea at the time (before async zeroconf etc). But if you feel it's adding complexity, maybe we could get rid. Polling will naturally retry, zeroconf updates should trigger a retry also. And we could add zeroconf s# support (if it's present and not hardcodes to 1 then it seems to work like BLE) to trigger reconnects.

bdraco commented 1 year ago

There is also another race here since the add callback done is delivered via a call_soon so we only see it on the next iteration which means it can set the task to none when it's already started a new one

bdraco commented 1 year ago

I think the background reconnect is still needed for users with unreliable mdns

We get zeroconf tickets all the time where one of their repeaters stops forwarding multicast but unicast still works

bdraco commented 1 year ago

I did a bit of chaos testing with this by having a smart plug turn off a homekit smart plug every 90 seconds and while the homekit plug is flipping on/off every second from HA

Everything works now with no reconnect failures