asynchronics / asynchronix

High-performance asynchronous computation framework for system simulation
Apache License 2.0
170 stars 9 forks source link

How to cleanly abort a simulation? #22

Open teh opened 5 months ago

teh commented 5 months ago

I wonder whether it'd be possible to modify InputFn to allow the following code to compile:

impl DelayedMultiplier {
    pub fn input(&mut self, value: f64, scheduler: &Scheduler<Self>) -> anyhow::Result<()> {
        scheduler
            .schedule_event(Duration::from_secs(1), Self::send, 2.0 * value)?;
        Ok(())
    }
    async fn send(&mut self, value: f64) {
        self.output.send(value).await;
    }
}

Or maybe I'm just misunderstanding the API?

sbarral commented 5 months ago

Have you considered a replier port rather than an input port? This would then work as-is. See the ReplierFn trait, or this example.

Or would you expect the simulator itself to handle the error rather than the model making the request? If yes then there isn't much it could do except panic I guess?

Edit: this makes me realize that Asynchronix' request/reply mechanism may be affected by a soundness issue discovered recently in a dependency (this only affects reply types with explicit destructors). I will publish v0.2.2 today with the fixed dependency (Edit 2: done).

teh commented 5 months ago

I understand your reply and hand't looked at ReplierFn so that's useful thanks! I was trying to get at something else: what happens when I e.g. instantiate 2 Simulations, then a failure in schedule_event would take down my entire program with a panic rather than e.g. returning a Result in step . Maybe there are good reasons (performance etc) not to implement it that way though?

sbarral commented 5 months ago

If I understand well (correct me if I'm wrong) you would like the simulator to abort the simulation when an irrecoverable situation arises but without emitting a panic.

This is a legitimate concern and the reason it isn't implemented is simply that this is a non-trivial feature for which there wasn't enough dedicated resources yet :-) We should probably consider as well adjacents issues such as panicking models and deadlocks. Ideally, these would all share a common failure-reporting mechanism.

So with the caveat that this is not going to happen overnight, we can certainly start thinking about a way to go about this, perhaps maybe even for v0.3 (no promise!).

There is already a catch_unwind handler that captures model panics in executor threads. For now it simply re-emits the original panic in the main simulator thread, but it was always the plan to make it possible to optionally return an error. The main problem is that this would most likely requires all Simulation::step* and Simulation::send_* methods to return a Result even if the simulator is configured to propagate the panic, but we will probably be moving towards returning a Results anyway once deadlock detection is implemented.

Models could then also report custom errors: a thread-local function abort_with_error(error: Box<std::error::Error + Send>) would register the error and panic, allowing the main thread to catch the panic and bubble up the error. I think that this is better than having InputFn accept a result since (i) the abort mechanism feels more appropriate for errors that are not strictly related to a particular input and (ii) returning a Result would be hard to extrapolate to ReplierFn without introducing ambiguities (is the Result part of the payload or meant for the simulator?).

Would that solve your issue? Feel free to refine or criticise this preliminary solution.

teh commented 5 months ago

Thank you for the thorough reply! I don't have a specific issue, I'm just probing the design a bit against different problems that I solved in the past. It looks like a fantastic library and I will definitely continue playing with it. Thanks for publishing it!

sbarral commented 5 months ago

Glad you like it and thank you very much for the kind words! In any case, being able to cleanly abort the simulation is a worthy feature to have so I will just edit a bit the title of this issue and leave it open until some support for this is implemented (no promise on a timeline though).