Smithay / calloop

A callback-based Event Loop
MIT License
176 stars 34 forks source link

Mitigate transient source leak #106

Closed detly closed 2 years ago

detly commented 2 years ago

This addresses #104. Yes, it doubles the size of the type, but I'm inclined to say that's not a huge deal, since you'd need more than that space to do this correctly at a higher level.

The two extra tests are a bit dense, but they basically test the edge cases that led to this in the first place. I have not squashed every commit - I've broken it into three parts:

Changelog update to follow.

codecov[bot] commented 2 years ago

Codecov Report

Merging #106 (c56cfcd) into master (613c839) will decrease coverage by 0.18%. The diff coverage is 71.01%.

@@            Coverage Diff             @@
##           master     #106      +/-   ##
==========================================
- Coverage   89.12%   88.93%   -0.19%     
==========================================
  Files          17       17              
  Lines        1673     1699      +26     
==========================================
+ Hits         1491     1511      +20     
- Misses        182      188       +6     
Flag Coverage Δ
macos-latest 88.37% <71.01%> (-0.22%) :arrow_down:
ubuntu-latest 87.23% <71.01%> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sources/ping.rs 85.71% <ø> (-14.29%) :arrow_down:
src/sources/transient.rs 78.51% <71.01%> (+1.67%) :arrow_up:
src/loop_logic.rs 88.93% <0.00%> (-0.43%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 613c839...c56cfcd. Read the comment docs.

detly commented 2 years ago

Taking the leak discussion out here. I can't yet repro the ever-increasing-fd leak I saw with the async executor; I'll have to trim down some application code until I can get it again.

However, I was able to show a couple of issues using async-process All these branches have the same basic changes: some extra debugging in src/sys/mod.rs to show registration, and an example binary that drops and replaces an async executor over and over again. In each branch you can run it with

cargo run --example leak --features executor

Branches:

Effects:

I'll see if I can get the subprocess-related leak into a dmeo for you too.

detly commented 2 years ago

I feel like I'm losing it. I went back to my code from last week to try to un-fix what I fixed - no leak. Certainly I see sources not being unregistered, but I don't see the ever-increasing FD numbers I saw during my testing.

I think this is still a valid change - having FDs just quietly get closed but not updating our event loop structures is not ideal. But it's not the leak I thought it was.

elinorbgr commented 2 years ago

I agree that's still cleaner overall to properly unregister stuff like your PR does. I just think the doc comment can be adjusted to be less scary. What about something along the line of "Not properly unregistering the sources can cause spurious wakeups of the event loop, and might leak file descriptors in some cases." ?

detly commented 2 years ago

I agree that's still cleaner overall to properly unregister stuff like your PR does. I just think the doc comment can be adjusted to be less scary. What about something along the line of "Not properly unregistering the sources can cause spurious wakeups of the event loop, and might leak file descriptors in some cases." ?

Very reasonable point, except that with Calloop as it is now, it'll actually cause a panic. Do you want to relax the FD reregistration check? I'm not completely certain either way, myself.

Either way I'll tweak the wording. I am starting to suspect that the more extreme version of the leak I saw was a combination of this issue and something in an external library.

elinorbgr commented 2 years ago

Very reasonable point, except that with Calloop as it is now, it'll actually cause a panic. Do you want to relax the FD reregistration check? I'm not completely certain either way, myself.

Right, I meant "Not properly unregistering event source before dropping them". Removing an event source without unregistering it and then re-registering it again is definitely a wrong usage.

detly commented 2 years ago

Very reasonable point, except that with Calloop as it is now, it'll actually cause a panic. Do you want to relax the FD reregistration check? I'm not completely certain either way, myself.

Right, I meant "Not properly unregistering event source before dropping them". Removing an event source without unregistering it and then re-registering it again is definitely a wrong usage.

I think you misunderstand, even if you don't re-use the same event source, as you point out the CloseOnDrop handler will close the fd, the platform will recycle the fd for the next eg. pipe or socket or file that gets used, and then the code in src/sys/mod.rs will see a duplicate fd. Hence the panic.

elinorbgr commented 2 years ago

Ah, right, I forgot about that check. :sweat_smile:

detly commented 2 years ago

Made the wording a bit less dire, anyway, but still mentioned the possibility of panicking.