golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.72k stars 1.58k forks source link

proto.Equal does not check for nested invalidity #1631

Open exi opened 1 month ago

exi commented 1 month ago

What version of protobuf and what language are you using? Version: go, 1.34.2

What did you do? It's possible to construct a proto with a nested nil pointer value as a message, leading to an invalid nested message. This causes proto.Equal to return false if one of the top-level message is nil but will return true if they contain invalid message in nested fields.

What did you expect to see? Since the proto.Equal contract states that no message can be equal to an invalid message, it should return false if nested fields are invalid.

What did you see instead? proto.Equal will return true even with nested invalid messages.

Anything else we should know about your project / environment?

Adding this to equalMessage in protoreflect/value_equal.go, will lead to test failures in proto:encode_test

if !mx.IsValid() || !my.IsValid() {
  return false
}
stapelberg commented 1 month ago

Via chat, exi also provided this reproducer:

--- i/proto/encode_test.go
+++ w/proto/encode_test.go
@@ -53,6 +53,11 @@ func TestEncode(t *testing.T) {
                if !proto.Equal(got, want) && got.ProtoReflect().IsValid() && want.ProtoReflect().IsValid() {
                    t.Errorf("Unmarshal returned unexpected result; got:\n%v\nwant:\n%v", prototext.Format(got), prototext.Format(want))
                }
+               pe := proto.Equal(got, want)
+               pre := protoreflect.ValueOf(got.ProtoReflect()).Equal(protoreflect.ValueOf(want.ProtoReflect()))
+               if pe != pre {
+                   t.Errorf("proto.Equal and protoreflect.Equal disagree; proto.Equal:\n%v\nprotoreflect.Equal:\n%v\ngot:\n%v\nwant:\n%v", pe, pre, prototext.Format(got), prototext.Format(want))
+               }
            })
        }
    }

…which makes TestEncode/nil_messages fail.

stapelberg commented 1 month ago

Did some digging. https://go.dev/cl/196618 introduced the behavior (“proto/equal: equate nil”), https://go.dev/cl/464275 documented it.

I tried to make protoreflect.Value.Equal do the same check that proto.Equal does:

--- i/reflect/protoreflect/value_equal.go
+++ w/reflect/protoreflect/value_equal.go
@@ -92,6 +92,10 @@ func equalMessage(mx, my Message) bool {
                return false
        }

+       if validx, validy := mx.IsValid(), my.IsValid(); !validx || !validy {
+               return !validx && !validy
+       }
+
        nx := 0
        equal := true
        mx.Range(func(fd FieldDescriptor, vx Value) bool {

However, this makes TestDecode/repeated_nil_messages fail (proto/decode_test.go):

    decode_test.go:50: Unmarshal returned unexpected result; got:
        repeated_nested_message:  {
          a:  1
        }
        repeated_nested_message:  {}
        repeated_nested_message:  {
          a:  2
        }

        want:
        repeated_nested_message:  {
          a:  1
        }
        repeated_nested_message:  {}
        repeated_nested_message:  {
          a:  2
        }

The printed messages are identical, but in delve, I can see the difference:

(dlv) next
> google.golang.org/protobuf/proto.Equal() ./equal.go:42 (PC: 0x8807bf)
    37: // of the concrete message type. For example, (*pb.M)(nil) is not equal
    38: // to &pb.M{}.
    39: // If two valid messages marshal to the same bytes under deterministic
    40: // serialization, then Equal is guaranteed to report true.
    41: func Equal(x, y Message) bool {
=>  42:     if x == nil || y == nil {
    43:         return x == nil && y == nil
    44:     }
    45:     if reflect.TypeOf(x).Kind() == reflect.Ptr && x == y {
    46:         // Avoid an expensive comparison if both inputs are identical pointers.
    47:         return true

(dlv) p x
google.golang.org/protobuf/reflect/protoreflect.ProtoMessage(*google.golang.org/protobuf/internal/testprotos/test.TestAllTypes) *{
[…]
    RepeatedNestedMessage: []*google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage len: 3, cap: 4, [
        *(*"google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage")(0xc000651800),
        *(*"google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage")(0xc000651840),
        *(*"google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage")(0xc000651880),
    ],
[…]

(dlv) p y
google.golang.org/protobuf/reflect/protoreflect.ProtoMessage(*google.golang.org/protobuf/internal/testprotos/test.TestAllTypes) *{
[…]
    RepeatedNestedMessage: []*google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage len: 3, cap: 3, [
        *(*"google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage")(0x11edbe0),
        *nil,
        *(*"google.golang.org/protobuf/internal/testprotos/test.TestAllTypes_NestedMessage")(0x11edc20),
    ],
[…]

So the tests currently rely on messages being equal despite their “nil-ness” being different.

@neild @dsnet Any intuition as to what we should do here? Thanks.

neild commented 1 month ago

If I recall correctly, we originally wanted to have proto.Equal(&M{}, (*M)(nil)) == true, but found that users were depending on this being false. As a general rule, we try to treat (*M)(nil) as exactly equivalent to &M{}.

The proto.Equal documentation about invalid messages was added relatively recently, in https://go.dev/cl/464275. I think the thing to do here is correct the documentation to accurately describe the implementation: proto.Equal(a, b) is false if a.IsValid() != b.IsValid(), but otherwise an empty message is an empty message.

puellanivis commented 1 month ago

Agree, I’ve been trying to follow the details of this, but have found it hard to understand. An empty message should be an empty message, whether that’s it being &M{} or if it’s (*M)(nil), so my brain just hasn’t been able to grasp what the concern here is.

dsnet commented 1 month ago

I regret not writing a more descriptive rationale for the reason for the change, but the timing of the commit aligns with @neild's hypothesis.