Open oschaaf opened 4 years ago
With #355 merged, I can no longer reproduce this, and when backing it out, with some effort, I can.
The asserts that triggers when the worker stacks unwind squelch the original exception(s) that started it all, thereby masking the actual problem.
We could remove our own assertion in that code paths, which ensure that workers are properly shutdown(), but then there will be other ones originating from Envoy's code base that will still fire, so that part is not trivial to resolve. It's also not easy to rewrite the startup code; there is a place where shutdown()
will have to be called to cleanly stop, but we're not fully initialized yet.
On the flip side, #355 resolves the first real world instance of hitting this, so the pressure is off for now. Leaving this open for future discoverability and a reminder to pursue making the related code in process_impl.cc
more robust.
Observed a CI run associated to https://github.com/envoyproxy/nighthawk/pull/316
Full log: https://circleci.com/api/v1.1/project/github/envoyproxy/nighthawk/11758/output/102/0?file=true&allocation-id=5e679b96b55f4d48eecdfc16-0-build%2F5ECCDF5B
The assert may or may not be related to the Envoy update in #316, it has been first observed there.
The assert that gets triggered resides here: https://github.com/envoyproxy/nighthawk/blob/b631a17bfbba1c1b46ac7b3b1bf5adb4b05719d7/source/client/process_impl.cc#L123
However, after scrutinizing the code over at https://github.com/envoyproxy/nighthawk/blob/b631a17bfbba1c1b46ac7b3b1bf5adb4b05719d7/source/client/process_impl.cc#L416 I think it the most likely thing that happened is that there was an exception in
ProcessImpl::run()
in somewhere aftershutdown_
gets set to false. This code should be revisited to properly unwind in the (unlikely) case of partial initialization. At the very least it should log a fatal message when an exception gets thrown halfway, so that the message associated tot he exception doesn't get squelched by the assert above.