cloudnative-pg / cnpg-i-machinery

CloudNativePG Interface Machinery
Apache License 2.0
2 stars 3 forks source link

[Bug]: Segfault if Reconciler hooks registered, but either Pre or Post hooks not implemented #32

Open ringerc opened 4 months ago

ringerc commented 4 months ago

Is there an existing issue already for this bug?

I have read the troubleshooting guide

What happened?

I wrote a reconciler hook that registered a Pre hook callback.

It ran, then segfaulted after returning.

The cause is that the Post hook must also be implemented. Otherwise the plugin helper tries to call a null function pointer.

Plugin helper should check that both Pre and Post hooks are non-null and error at registration time. Or it should skip unimplemented hooks.

This was made confusing by the panic handler's swallowing of the stack trace, so all I got was the "error": "panic: runtime error: invalid memory address or nil pointer dereference" but no details. See https://github.com/cloudnative-pg/cnpg-i-machinery/issues/33

Cluster resource

No response

Relevant log output

{"level":"info","ts":1719969643.970347,"caller":"pluginhelper/server.go:195","msg":"Panic occurred","error":"runtime error: invalid memory address or nil pointer dereference"}

Code of Conduct

armru commented 3 months ago

Hey @ringerc, what would be the best solution for you?

armru commented 3 months ago

Hello again, did also try to embbed UnimplementedReconcilerHooksServer?

ringerc commented 3 months ago

I did not - I used the example plugin as a base but it doesn't implement the reconciler hooks.

I didn't see this anywhere in the docs and examples at the time I was working on it. I expected that the helper would check if the routine was implemented and gracefully, silently skip it if not. That expectation was probably misunderstanding of something fundamental on my part.

I now see that the other implementations do have an unimplemented helper base:

type Implementation struct {
    lifecycle.UnimplementedOperatorLifecycleServer
}

whereas mine uses the server type:

type ReconcileHandler struct {
    reconciler.ReconcilerHooksServer
}

It would be helpful if the server types had prominent references to the Unimplemented bases in their godoc. A comment in the hello world to mention why it's used wouldn't hurt either.

Where does that come from anyway? https://github.com/search?q=org%3Acloudnative-pg+UnimplementedOperatorLifecycleServer&type=code - I don't see it defined anywhere. Is it autogenerated as part of the gRPC protobuf machinery?

armru commented 3 months ago

Yes it is autogenerated and yes we should have a better reference to it :)

If you can confirm me that the problem is fixed with the lifecycle.UnimplementedOperatorLifecycleServer I will proceed with a documentation PR

ringerc commented 3 months ago

Yes, that was correct. Using lifecycle.UnimplementedOperatorLifecycleServer as the base solved the issue, it's just a docs issue.