dpc / mioco.pre-0.9

Scalable, coroutine-based, asynchronous IO handling library for Rust programming language. (aka MIO COroutines).
Mozilla Public License 2.0
457 stars 30 forks source link

clear documentation on handling panics #145

Closed gregwebs closed 8 years ago

gregwebs commented 8 years ago

I would like to be able to

Easy to use abstractions for this functionality do not have to be provided by mioco. Instead I would just like to document how to safely handle panics.

I cannot event get mioco to shutdown right now in code that looks like the below:

mioco::start(||{
  mioco::spawn(||{
    if let Err(err) = panic::catch_unwind(||{
      panic!("thread code");
    }) {
      error!("panic {:?}", err);
      mioco::shutdown();
    }
  });
  mioco::spawn(||{
    if let Err(err) = panic::catch_unwind(||{
      loop { mioco::sleep(Duration::from_secs(1) ); }
    }) {
      error!("panic {:?}", err);
      mioco::shutdown();
    }
  })
})
dpc commented 8 years ago

Mioco should already by handing panics, by doing an unwind and catching it. Shutdown should kill all the existing coroutines and catch it internally too. If it doesn't work, it's a bug. Do you have any full working example (gist or something), that I could try to reproduce the problem you're experiencing?

I'm open on opinions where to add any more information (or PRs) to documentation too.

dpc commented 8 years ago

For reference: https://github.com/dpc/mioco/blob/master/src/coroutine.rs#L239

        let coroutine_main_fn = move || {
            let coroutine_user_fn = panic::AssertUnwindSafe(coroutine_user_fn);
            let res = panic::catch_unwind(move || {

                let coroutine = unsafe { tl_current_coroutine() };
                if coroutine.killed {
                    panic::resume_unwind(Box::new(Killed))
                }

                coroutine_user_fn()
            });

            let coroutine = unsafe { tl_current_coroutine() };
            coroutine.blocked_on.clear();
            coroutine.self_rc = None;
            coroutine.state = State::Finished;

            let id = coroutine.id;
            {
                let mut handler_shared = coroutine.handler_shared
                                                  .as_ref()
                                                  .unwrap()
                                                  .borrow_mut();
                handler_shared.coroutines.remove(id).unwrap();
                handler_shared.coroutines_dec();
            }

            let config = coroutine.handler_shared().coroutine_config;
            match res {
                Ok(res) => {
                    co_debug!(coroutine, "finished normally");
                    let _ = exit_sender.send(Ok(res));
                }
                Err(cause) => {
                    if config.catch_panics {
                        co_debug!(coroutine, "finished by panick");
                        let _ = exit_sender.send(Err(cause));
                    } else {
                        // send fail here instead with the internal reason,
                        // so the user may get a nice backtrace
                        let handler = coroutine.handler_shared.as_ref().unwrap().borrow();
                        sender_retry(&handler.get_sender_to_own_thread(),
                                     Message::PropagatePanic(cause));
                    }
                }
            }
         };

coroutine_user_fn() runs in panic::catch_unwind block, then the code checks how was the coroutine_user_fn terminated, and sends result through a channel for other threads to receive.

gregwebs commented 8 years ago

Ok, good to know it catches panics. I can add that to the documentation (at least for spawn). I am still at a loss to spawn 2 coroutines and then wait for the first one to finish (both are supposed to run forever) and then call shutdown when either 1 of them finishes. join is blocking, so how do I join multiple coroutines at the same time? It seems like I need to do my own panic catching anyway as per the snippet I gave.

gregwebs commented 8 years ago

I think my issue with not shutting down could be the need to add a sync block. The main documentation states that mioco::sync should be used but this function does not appear to be exported.

dpc commented 8 years ago

sync has been renamed in the master branch due to confict with sync module.

join on coroutine will block only to the end of that coroutine. You can always store JoinHandls in a vector and block one by one on them. mioco::start will return after all coroutines finished, too.

If you could show me full code (without the custom unwind catching) and describe what's the difference between what you expect, and what is actually happening, I could probably help more.

Thanks for PR, much appreciated.

gregwebs commented 8 years ago

I sent #147 to fix sync -> offload documentation. This change seems to be on the 0.7.0 release where I can call it in my program, but I don't see offload in the generated docs.

gregwebs commented 8 years ago

blocking joins one-by-one would not match my use case of stopping all related spawns when one stops. Essentially I want a select on the joins.

Ultimately I want to be able to throw a panic back to the main execution thread to stop the program rather than silently trapping panics in coroutines and not realizing that they occur. But I also want to give each coroutine a chance to gracefully exit by calling mioco::shutdown.

Here is what I came up with for accomplishing this right now: https://github.com/gregwebs/mioco-shutdown/blob/master/src/main.rs

gregwebs commented 8 years ago

This is something more like what I want:

// similar to the normal spawn
// However, if a panic occurs, call `mioco::shutdown`
fn spawn_catch_shutdown <F, T>(f: F)
  where F: FnOnce() -> T, F: std::panic::UnwindSafe, F: Send + 'static, T: Send + 'static
{
    mioco::spawn(|| {
        if let Err(err) = panic::catch_unwind(|| {
          f();
        }) {
          mioco::shutdown();
        }
    } );
}

As a separate issue because I am thinking about having multiple coroutine groups where a group is linked to all fail together (using mioco::start for this right now), I am confused about the behaviour of shutdown. The documentation says that it shuts down the mioco instance, but also suggests calling mioco::spawn to launch it if needed. mioco::start seems to suggest that multiple instances can be running at the same time. Which instance would shutdown stop in that case?

dpc commented 8 years ago

You can have multiple mioco instances. mioco:spawn/mioco::start will spawn a new instance if it was called from outside an existing instance.

Eg.

fn main() {
mioco::spawn(|| { ... });
mioco::spawn(|| { ... });
}

will get you two separate instances of mioco - each will have to shutdown separately. But:

fn main() {
mioco::spawn(|| { 
    mioco::spawn(|| { ... });
    ...
 });

}

will be just one.

gregwebs commented 8 years ago

But if there are 2 instances running, it is not defined which one will shutdown when this code is run?

mioco::spawn(||{mioco::shutdown})

That seems like a buggy edge case

dpc commented 8 years ago

shutdown is supposed to kill the whole instance (all coroutines inside it).

gregwebs commented 8 years ago

But there can be multiple instances. Which instance?

dpc commented 8 years ago

The one that you executed mioco::shutdown() inside of. mioco::spawn creates an instance if it was not executed within one already. If it was, it will just add one more coroutine to it. For reference: https://github.com/dpc/mioco/blob/master/src/mod.rs#L869

gregwebs commented 8 years ago

But what if 2 instances have been created (with mioco::start)?

dpc commented 8 years ago

Then you need to mioco::shutdown in each of them.

gregwebs commented 8 years ago

right, just trying to point out that the API allows for uses with undefined behavior.

I couldn't fix my issue with mioco::offload (The thread makes a blocking call to docker.events(&Default::default());). Glad I could help out with some docs, but I think I will try going back to OS threads for now.

dpc commented 8 years ago

Hmmm... Is it undefined or just not clearly enough explained in the docs?