cockroachdb / errors

Go error library with error portability over the network
Apache License 2.0
2.04k stars 66 forks source link

support multi-cause errors in go 1.20 #112

Closed dhartunian closed 11 months ago

dhartunian commented 1 year ago

Go 1.20 introduces the idea of an error with multiple causes instead of a single chain. This commit updates the errors library to properly encode, decode, and format these error types.

For encoding and decoding we use the existing EncodedLeaf type and embellish it with a causes field. This is done in order to keep the encoding/decoding backwards compatible. EncodedLeaf types containing multiple causes when decided by earlier versions will simply see an opaque leaf with a message inside. The reason the EncodedWrapper is not used here is because the wrapper already contains a mandatory single cause field that we cannot fill with the multi-errors. A new type cannot be used because it would not be decodable by older versions of this library.

Note for reviewers: First commit is from https://github.com/cockroachdb/errors/pull/113


This change is Reviewable

cockroach-teamcity commented 1 year ago

CLA assistant check
All committers have signed the CLA.

knz commented 1 year ago

This makes me think that the protobuf encoding may have no choice but to stuff multierrors in the existing LeafEncodedError type somehow. Perhaps with additional protobuf fields.

Otherwise, when the DecodeError() function in previous versions receives a payload encoded in the new version, it would crash.

dhartunian commented 1 year ago

The main shortcoming I see in the current approach is that you strip the "shell" go type of a multierror when encoding. It should be preserved. This is examplified by the following additional unit tests (didn't touch your code, just added tests): https://github.com/dhartunian/errors/pull/1

@knz this is tricky and I want to check with you before going down some weird reflection path...the wrapErrors object cannot be reconstructed on the decode side as far as I can see because the msg field within it does not contain the format string, it contains the format string already interpolated with the stringified causes. Hence, I can't call fmt.Errorf on the decode side to reconstruct it, even if I could pull out the fields from the private struct during encoding.

Furthermore, we currently don't support the Is case on single-cause wrapError instances either, which have existed for a while.

knz commented 1 year ago

Furthermore, we currently don't support the Is case on single-cause wrapError instances either, which have existed for a while.

Could you explain a bit more this point?

dhartunian commented 1 year ago

If I modify your test example to look like this (single %w)

func TestStandardFmtMultirrorRemoteEquivalence2(t *testing.T) {
    tt := testutils.T{T: t}

    err1 := fmt.Errorf("hello %w", goErr.New("world"))
    err2 := fmt.Errorf("hello %w", goErr.New("world"))

    newErr1 := network(err1)

    tt.Check(markers.Is(err1, newErr1))
    tt.Check(markers.Is(newErr1, err1))
    tt.Check(!markers.Is(err2, newErr1))
    tt.Check(!markers.Is(newErr1, err2))
}

I still get an error on master

=== RUN   TestStandardFmtMultirrorRemoteEquivalence2
    markers_test.go:173: check failed
        context:

        >   tt.Check(markers.Is(err1, newErr1))
            tt.Check(markers.Is(newErr1, err1))

    markers_test.go:174: check failed
        context:
            tt.Check(markers.Is(err1, newErr1))
        >   tt.Check(markers.Is(newErr1, err1))
            tt.Check(!markers.Is(err2, newErr1))

--- FAIL: TestStandardFmtMultirrorRemoteEquivalence2 (0.00s)

That case has been supported in the go stdlib since before 1.20 and it constructs a single-cause wrapError (singular): https://github.com/golang/go/blob/4a0a2b33dfa3c99250efa222439f2c27d6780e4a/src/fmt/errors.go#L31-L34

knz commented 1 year ago

aw now i understand. that looks like a baseline bug (independent from the current project). It's unfortunate and I think we need to look at this too. I can probably help, but I'm also ok with us looking at it together.

dhartunian commented 1 year ago

I'm fine with including the fix as part of this work since it's certainly related.

Let me see if I can write a POC implementation of markers.Is using the type info encoded in the EncodedErrorDetails struct in opaqueWrapper. That feels more correct than trying to reconstruct the wrappers on the other side.

knz commented 1 year ago

if I can write a POC implementation of markers.Is using the type info encoded in the EncodedErrorDetails struct in opaqueWrapper

The part that surprises me is that it's already supposed to work. The code is there (errbase.GetTypeMark). I don't understand why it does not (appear to) work with errors coming from fmt.Errorf with %w.

dhartunian commented 1 year ago

It looks like an encoding problem. The code tries to extract a "prefix" from the wrapper and writes a blank one in this case which breaks the equality check on errMark.msg in the markers.Is implementation.

I wonder if the prefix should be set to the full interpolated string from the wrapper, and then the inner messages should be ignored in the case where the wrapper original type is wrapError. Let me try that out...

dhartunian commented 1 year ago

Just pushed a new commit on top of your tests @knz. Everything now passes! I think there are a few rough edges to clean up but I had missed a few things about the opaqueWrapper that were necessary to make the opaqueMultiWrapper work.

dhartunian commented 1 year ago

New PR for the bugfix and flag as discussed in your comment @knz https://github.com/cockroachdb/errors/pull/113

knz commented 11 months ago

One thing i would re-iterate is that i'd like us to have a variant of this PR (and #113) on a branch that's compatible with go 1.18, which we can then import in crdb 23.1.

dhartunian commented 11 months ago

One thing i would re-iterate is that i'd like us to have a variant of this PR (and https://github.com/cockroachdb/errors/pull/113) on a branch that's compatible with go 1.18, which we can then import in crdb 23.1.

Agreed. My thinking was that once we've merged the outstanding PRs for 1.20, I'll backport into a 1.18 PR a version that omits the code that requires 1.20, probably just tests that use the new APIs. Seemed like that would be easiest to manage.

knz commented 11 months ago

Yes, good thinking.

knz commented 11 months ago

I forgot to mention - we also need this patch to include the various new exported APIs from the std "errors" package.

This includes Join and there are perhaps others. I'd use a diff on the go doc in 1.19 and 1.20 to know.

knz commented 11 months ago

perhaps include PR #106 in here.

dhartunian commented 11 months ago

Added build-and-test jobs on go-1.20-upgrade branch here which use 1.20 and 1.21. This branch will not pass with earlier versions.