Closed erikgrinaker closed 3 years ago
Thanks @knz It looks fine to me in terms of implementation and makes sense to maximize compatibility.
I guess I have only a comment on best practices or standards of usage, which may seem strange coming from an outsider, but since I have been using the extgrpc
/middleware
packages I wrestled with this and I'll explain what I mean:
If you have a service using cockroachdb/errors
, and in particular the gRPC bits, I have seen a limited amount of scenarios where you want to return an actual gRPC Status. Since it's a sort of leaf, you can't truly wrap another error, there's no stack trace information, etc. That's fine if it's an OK
, but if you are returning a non-OK
code you are likely dealing with an error somewhere deeper in your service, either your own code or a library, and I personally would want to wrap all that information and send it to the caller.
That's why I contributed cockroachdb/errors/grpc/status, an offhand stab at a Status API that can preserve a stack of errors. I'm sure it lacks in various ways, but from a DX perspective it's truly awesome to have a failed call give you a stack trace that's 3 services deep. If you're returning raw gRPC Status objects, or their GoGo equivalents, you might be missing out on the last mile of that trace?
Again, that's just commentary. Code is LGTM.
I think you have a good point @tooolbox, but I don't think these approaches are mutually exclusive. I checked the CRDB codebase, and could only find two places where a gRPC Status actually wrapped another error -- all others were leaf errors triggered by some condition (e.g. a failed permission check). Where we're wrapping an error it'd probably be better to use WrapWithGrpcCode
to preserve details -- and in that case we may also want to make withGrpcCode
implement the gRPC status interface (i.e. add GRPCStatus() *Status
) so that they can be used interchangably with the gRPC status package.
Regardless, I still think there's value in preserving a Status
error as it passes through EncodeError
/DecodeError
. This currently results in an opaqueLeaf
instead, which has no structure at all. This is a problem in CRDB, since we actually use EncodeError
/DecodeError
internally in the codebase while propagating errors up the stack (which arguably may not be a good idea in itself). This means that when a gRPC client call receives a codes.PermissionDenied
and returns that up the stack, that information is lost and the caller is unable to determine that it was in fact a permission error and e.g. stop retrying the operation.
While we probably should improve the error handling in CRDB, that's a larger yak to shave, and I don't see any harm in preserving Status
errors across encode/decode to solve the immediate problem at hand.
This currently results in an opaqueLeaf instead, which has no structure at all. This means that when a gRPC client call receives a codes.PermissionDenied and returns that up the stack, that information is lost
Can you spell this out further? opaqueLeaf
is meant to actually preserve details about the original error, so that it can still be decoded elsewhere. This makes it possible to start implementing a decode
function without any corresponding encoder (e.g. it's possible to start decoding an error that's been encoded as opaque by a node running a previous version of the code, prior to the implementation of the decode function)
Can you spell this out further?
When we send a batch command to the target replica via RPC, we end up wrapping the resulting error here (roachpb.NewError
calls EncodeError
internally):
Further up the stack the Protobuf roachpb.Error
is converted back into a Go error
(GoError
calls DecodeError
internally):
A caller higher up the stack then tries to check whether the error
is a gRPC authentication error by checking its status code:
However, at this point error
is opaqueLeaf
, so status.FromError()
is unable to convert it into a gRPC status, and IsAuthError
returns false. I don't believe opaqueLeaf
can preserve the Status
Protobuf automatically, because it's a standard Go Protobuf which isn't recognized by the Gogo decoder, and thus it can't populate opaqueLeaf.details.FullDetails
which is a Protobuf Any
field.
I don't believe opaqueLeaf can preserve the Status Protobuf automatically, because it's a standard Go Protobuf which isn't recognized by the Gogo decoder
ah! that's the bit I was missing. That is indeed unfortunate.
Gotcha @erikgrinaker, yeah if your service(s) are shallow enough that leaf error generation sites know they're in a gRPC service, then great.
we may also want to make
withGrpcCode
implement the gRPC status interface
Perhaps, yeah. I would like to make sure that the grpc/middleware
package still works transparently, as intended.
I don't see any harm in preserving Status errors across encode/decode to solve the immediate problem at hand.
For sure, agreed.
This patch adds support for automatic en/decoding of gRPC
Status
errors, enabled by importing theextgrpc
package. It supports statuses generated both by the standardgoogle.golang.org/grpc/status
and the gogoproto-basedgithub.com/gogo/status
packages.Encoded statuses are always represented as
gogo/status
-basedStatus
Protobufs, since theEncodedError
type is a gogoproto type using anAny
field for structured errors, and only gogoproto types can be placed here since standard Protobufs are not registered in the gogoproto type registry. We lock thegogo/googleapis
package to 1.2.0 to use gogoproto 1.2-compatible Protobufs required by CRDB, these have been verified to not be vulnerable to the "skippy pb" bug.Note that to preserve error details, gogo-based
Status
objects with gogo-based details must be used, otherwise details may not be preserved. This is for similar reasons as outlined above, and is a limitation in the gRPC package -- see code comments for further details. A CRDB linter rule forbidding use ofStatus.WithDetails()
was submitted in cockroachdb/cockroach#61617.Resolves #63, related to #64 and cockroachdb/cockroach#56208.
This change is