canonical / pebble

Take control of your internal daemons!
GNU General Public License v3.0
137 stars 52 forks source link

fix(daemon): turn off net/http panic recovery #350

Closed benhoyt closed 1 month ago

benhoyt commented 5 months ago

By default, Go's net/http recovers panics in the ServeHTTP goroutine and keeps serving. However, if a lock is held when the panic happens, this will mean a complete deadlock and Pebble will go unresponsive. As far as I know, this hasn't happened yet, but we use snippets without defer like this fairly frequently if it's just a tiny amount of code and defer is awkward:

st.Lock()
foo := st.Foo()
st.Unlock()

If st.Foo() or whatever's between the lock and unlock panics, net/http will recover the panic but the lock will still be held. And in the case of the state lock that basically means all requests are blocked -- bad!

There's no direct way of opting out of this net/http behaviour, so you have to recover() from panics yourself, print the goroutine dump, and exit.

I've tested this locally, and it prints the goroutine dump and exits with code 1 on a panic as expected, instead of locking up. It also handles any panic in the handler or middleware, eg in logit.

benhoyt commented 5 months ago

I think overall this is an improvement, as we will now see panics as they happen, but it may lead to some unexpected panics in production uses of pebble until we identify the cause of the deadlock that spawned this line of inquiry.

@hpidcock The thing is, those "unexpected panics in production uses" have a decent chance of deadlocking the entire Pebble process (if a lock is held with defer.Unlock()), which is much worse than exiting (in which case the calling process will likely restart it). In any case, we'll chat further this afternoon about the best way forward.

benhoyt commented 5 months ago

Given we now know panic-recovery isn't the cause of the issue at hand, I'm going to come back to this after reproducing and fixing https://github.com/canonical/pebble/issues/314, but I'll leave this PR open -- I think it's probably a good idea in any case.

benhoyt commented 1 month ago

For reference, Gustavo was positive about this in a spec review meeting a couple of months ago (or more accurately, didn't love the default net/http behaviour and agreed it would be good to turn it off).