Catching Runtime Panic - Githubissues

cloudflare / pingora

A library for building fast, reliable and evolvable network services.

Apache License 2.0

20.28k stars 1.1k forks source link

Catching Runtime Panic #188

Closed rhzs closed 2 months ago

rhzs commented 2 months ago

Hi Team,

Just curious, is there anyway to catch runtime panic or adding middleware that kind of capability?

In tower, I can use ServiceBuilder to catch panic.

 let service = ServiceBuilder::new()
        .catch_panic();

Any example would be useful!

Thank you

vicanso commented 2 months ago

Pingora support sentry for catching runtime panic.

pub fn run_forever(&mut self) {
....
/* only init sentry in release builds */
        #[cfg(not(debug_assertions))]
        let _guard = match self.sentry.as_ref() {
            Some(uri) => Some(sentry::init(uri.as_str())),
            None => None,
        };

If throws a panic:

rhzs commented 2 months ago

@vicanso hmm... should I use sentry to catch a panic?

vicanso commented 2 months ago

I haven't found any other way yet. user_guide/panic

drcaramelsyrup commented 2 months ago

Yes, right now there's no other way to do this (if this = seeing panics for observability) except via Sentry. I don't think we're opposed to allowing alternative middleware layers however.

Note that tower's catch_panic middleware looks like it works by wrapping its handlers in catch_unwind, and the docs indicate this is definitely not the preferred method for general error handling. In our view panics are reserved for unrecoverable issues and logic errors that should never occur, e.g. to indicate inappropriate usage of APIs.

rhzs commented 2 months ago

Yes, right now there's no other way to do this (if this = seeing panics for observability) except via Sentry. I don't think we're opposed to allowing alternative middleware layers however.

How would I approach (implement) it, if I want to implement Tower layers with Pingora?

Note that tower's catch_panic middleware looks like it works by wrapping its handlers in catch_unwind, and the docs indicate this is definitely not the preferred method for general error handling. In our view panics are reserved for unrecoverable issues and logic errors that should never occur, e.g. to indicate inappropriate usage of APIs.

Yes, I meant for any unintended panic that occurred during runtime. I am curious, if Pingora is used in production. How would you recover from unintended panic? In such scenario, this could make our resources (cpu/mem/tcp connections) saturated,, cmiiw.

drcaramelsyrup commented 2 months ago

Not an expert on this, so others can feel free to weigh in. Looking briefly at what the sentry crate does, it seems to define a panic hook and that is probably the right way to go.

See the docs that @vicanso linked - panics fail/drop the request but otherwise shouldn't cause issues. Additionally, in production, we very rarely see any panics, and it should stay that way. If we see panics start to occur during deployments, that's an indicator we've made an incorrect logical assumption in our code and we need to fix it.

github-actions[bot] commented 2 months ago

This question has been stale for a week. It will be closed in an additional day if not updated.

github-actions[bot] commented 2 months ago

This issue has been closed because it has been stalled with no activity.