API: TransientSource requires the loop to run again before the source can be replaced

detly commented 2 years ago

Even though I wrote the TransientSource wrapper, I've hit a bit of a API/design problem that I could use some help with.

I'll use a Timer as an example, but this is not specifically a problem with timers, I have the same problem with custom sources that fire a finite number of times, expire and need to be replaced.

Let's say I have a source with a timer, but I don't want to start the timer unless something else happens. In the struct I might have:

struct TimedThing {
    is_sending: bool,
    // Only runs if is_sending is true
    timer: calloop::timer::Timer,
}

My constructor then has this problem:

impl TimedThing {
    fn new() -> Self {
        Self {
            is_sending: false,
            timer: todo!("?????"),
        }
    }
}

This is more or less what TransientSource is designed for, so I try:

struct TimedThing {
    is_sending: bool,
    // Only runs if is_sending is true
    timer: calloop::transient::TransientSource<calloop::timer::Timer>,
}

impl TimedThing {
    fn new() -> Self {
        Self {
            is_sending: false,
            timer: calloop::transient::TransientSource::None,
        }
    }
}

...and then at some point call self.timer = Timer::from_duration(...); when is_sending becomes true. Except that this only works if the timer does not already exist. If it does exist, this will leak ~~an epoll entry~~ whatever underlying thing keeps track of registration (wheel entry for Timers, epoll entry for fd-based sources), because the previous source will be dropped without being unregistered.

That's the problem with the TransientSource design that I'm trying to solve ie. there's no way to unregister a source without waiting for another iteration of the event loop.

Here's an attempt to work around it (pseudo-ish code):

fn process_events(...) {
    if (turn_off_sending) {
        self.is_sending = false;
        self.timer = TransientSource::None;
        // Oh no! Even if we return PostAction::Reregister, self.timer
        // no longer owns the timer we just replaced! It has no way to
        // unregister it now. If this were an event source that generated
        // continuous events
    }
}

Let's try being more diligent. The first thing that's apparent is that TransientSource needs to have a cancel() or remove() method - otherwise we're dependent on the source itself having another event fire, so we can return PostAction::Remove from that, so that the TransientSource "knows" it's child has asked to be unregistered. In the past this was fine, because all the sources I used it with would eventually, definitely, return Remove. Timer does not do that, so note the workaround for that below. (Again, this is pseudo code!)

#[derive(Copy, PartialEq)]
enum CleanupPhase {
    Leave, // <- start in this state
    UnregisterTimer,
    DropTimer,
}

struct TimedThing {
    is_sending: bool,
    // Only runs if is_sending is true
    timer: calloop::transient::TransientSource<calloop::timer::Timer>,
    cleanup: CleanupPhase,
    cleanup_rx: PingSource,
    cleanup_tx: Ping,
}

fn process_events(...) {
    let mut cleanup = CleanupPhase::Leave;

    if (turn_off_sending) {
        self.is_sending = false;
        self.timer.map(|t| {
            // 't' is the Timer itself. We need to force it to
            // generate another event so that we can make it return
            // Remove to TransientSource.
            t.set_deadline(std::time::Instant::now());
            // Now we need to keep track of whether we need another
            // loop, because the timer won't be unregistered until
            // after this method returns!
            cleanup = CleanupPhase::UnregisterTimer;
        }
    }

    // This won't happen until the next call to process_events()!

    self.timer.process_events(..., |...| {
        if (self.cleanup == CleanupPhase::UnregisterTimer) {
            cleanup = CleanupPhase::DropTimer;
            self.cleanup_tx.ping();
            TimeoutAction::Drop
        } else {
            // usual timer code
        }
    })?;

    // We STILL can't drop the timer. TransientSource has it in
    // the Remove variant and will have unregister() called after
    // this method exits. So we need an extra ping source to
    // wake the loop up a third time!

    // The state has to be part of our struct too, so that it
    // survives this function returning and firing again.
    self.cleanup = cleanup;

    // The TransientSource has now unregistered the Timer!
    if self.cleanup == CleanupPhase::DropTimer {
        self.timer = TransientSource::None;
        self.cleanup = CleanupPhase::Leave;
    }

    // Drain ping source, do other things, etc.
}

Yikes.

I have come up with some tentative options (not mutually exclusive) but there are some things I can't currently think of a solution to.

Add a cancel() method to TransientSource that triggers the reregistration of the wrapper and unregistration of the nested source. This still requires another go-around of the event loop.
Add a replace() or set() method to TransientSource so that the nested sources can be replaced without needing to drop and assign in the parent-of-TransientSource source.
Add a "trash can" field to TransientSource where a source set for removal can be stashed when the above method is called.

Maybe there's something else I'm missing here, so please think about it and let me know if you have other ideas. Fundamentally I'm stuck on the fact that you can't complete the unregistiration of a source inside process_events() (easily, anyway). But I can't see a way to change that, and I'm not even sure that you should, because it's quite a change to Calloop's design.

elinorbgr commented 2 years ago

As you present it, it seems to me that a combination of adding a replace() method and the "trash can" field would do the job nicely?

Though given this effectively doubles the stack size of TransientSource, maybe it would be work making two variants, one that supports changing the source from outside, and one, leaner, that only supports self-removing sources?

detly commented 2 years ago

I now have a test in a branch that (a) asserts on the drop-without-unregister condition (b) that I can make pass with the verbose code above (c) that I can make fail with the self.source = new_source.into() approach; and (d) that I can use to work on a better approach.

Though given this effectively doubles the stack size of TransientSource

I think it more than doubles, because you need a ping source + sender to wake the loop up, to finish the de-registration and replacement. I could be wrong about that - maybe it's only necessary if the parent is the one doing it.
I don't know that it matters, if space is an issue it can be used with a boxed source (which is how I always use it anyway).

maybe it would be work making two variants, one that supports changing the source from outside, and one, leaner, that only supports self-removing sources?

Perhaps, and it might be worth debug-asserting or just logging if the source is dropped without unregistering.

I've been turning over in my head whether it's possible to have a source unregister itself upon being dropped. It could be possible with some shared ownership (ie. refcounting) tricks, but I can't see it yet.

detly commented 2 years ago

because you need a ping source + sender to wake the loop up, to finish the de-registration and replacement

I was wrong, actually the whole thing can be done by adding a single variant:

/// The source is being replaced by another.
Replace { new: T, old: T },

I will send a PR when I've added some docs.

detly commented 2 years ago

I forgot to add a "closes" remark to #106 - was there anything else for this that I've forgotten about?

elinorbgr commented 2 years ago

No, all is good sorry, that just slipped out of my mind....

Smithay / calloop

API: TransientSource requires the loop to run again before the source can be replaced #104