Closed ringerc closed 3 months ago
Hello @ringerc, is this still relevant? Any proposed solutions?
Yes, it's still relevant @armru
I wasn't able to quickly figure out what is going wrong here. Even after I fixed the various logging issues in https://github.com/cloudnative-pg/cnpg-i-machinery/pull/31, https://github.com/cloudnative-pg/cnpg-i-machinery/pull/12 and https://github.com/cloudnative-pg/cnpg-i-hello-world/pull/7 I didn't manage to get the zap logger to emit a stack trace from the panic handler.
The logging configuration should emit stack traces when the development flag is passed to the plugin.
And the logging in the panic handler should preferably also respect the caller-info of the original panic site, so the logged event shows where the original panic happened, not the code location of the panic handler.
An error like this:
{
"level": "info",
"ts": 1719189460.064116,
"caller": "pluginhelper/server.go:190",
"msg": "Panic occurred",
"error": "runtime error: invalid memory address or nil pointer dereference"
}
tells me nothing about where the failure occurred, so it would be nigh-impossible to debug if it occurred in production.
Does the panic helper serve any purpose - or can it simply be removed?
this is being tricky to solve haven't come up with a good solution so far
Is there an existing issue already for this bug?
I have read the troubleshooting guide
What happened?
If a CNPG-I plugin panics in a gRPC call handler, the plugin helper's registered panic handler intercepts the panic and logs it - without the stack trace.
Because the handler doesn't propagate the logger context (see https://github.com/cloudnative-pg/cnpg-i-machinery/pull/31) a custom logger configuration cannot be used to work around this either.
Cluster resource
No response
Relevant log output
Code of Conduct