EmbarkStudios / crash-handling

Collection of crates to deal with crashes
Apache License 2.0
138 stars 13 forks source link

Crash handler crashing will hang. #82

Open kkartaltepe opened 8 months ago

kkartaltepe commented 8 months ago

If you panic in the crash handler you will re-enter the crash handler, this is blocked by a mutex in crash_handler::linux::state::signal_handler. In cases of mixed runtimes e.g CPP+Rust even innocuous things like std::process::exit in a crash handler will crash.

If we are considering crashing already I think it may be reasonable to consider crashing while crashing and give the author an opportunity to handle it. Mostly I was looking at this to model some windows utilities we already have that do this on linux, but I do think a mutex by default is pretty reasonable.

This would violate #79 because we could reenter the function before we finish crashing.

Jake-Shadle commented 8 months ago

I think it's made fairly clear in the documentation that as little work should be done in the crashevent callback since it is running in a compromised context, eg std::process::exit calls libc::exit which not async signal safe so it's not even guaranteed that your example scenario would not lead to undefined behavior if the handler was changed to allow reentrancy or some other strategy.

I'm inclined to close this, and possibly implement detection of this scenario and aborting the process, as I don't see how allowing this scenario would make sense. If the user callback has already crashed in the first call, why would the crash handler expect that second call would not also crash?

This would violate https://github.com/EmbarkStudios/crash-handling/issues/79 because we could reenter the function before we finish crashing.

That's only relevant if you use the helper that was added to support the user(s) who want FnOnce.

kkartaltepe commented 8 months ago

possibly implement detection of this scenario and aborting the process, as I don't see how allowing this scenario would make sense. If the user callback has already crashed in the first call, why would the crash handler expect that second call would not also crash?

This is basically what I am proposing as the bug fix. I cannot ensure that my crash handler completes and exits the process. I want to raise the chances I can crash successfully instead of the current situation where the process will remain hung in the crash handler.

One way is to allow the reentrance and let me the author of the code do something, like ignore further signals or abort. Or you could just force aborts in the library handler (but I do find the hanging behavior nice/interesting for debugging purposes).

I think it's made fairly clear in the documentation that as little work should be done in the crashevent callback since it is running in a compromised context, eg std::process::exit calls libc::exit which not async signal safe so it's not even guaranteed that your example scenario would not lead to undefined behavior if the handler was changed to allow reentrancy or some other strategy.

The point of the example is that crash handlers will not be failsafe, if you prefer we can use an example of panic! or SIGFPE or SIGSEGV or SIGBUS. As you say the program is compromised the unexpected can often happen, and id prefer to have the option of ending the program in this case.