golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.64k stars 1.58k forks source link

Best practices for getting size of Google Protobuf Struct value #1580

Closed jon-whit closed 7 months ago

jon-whit commented 7 months ago

I have a gRPC service that defines a request message involving a Google protobuf struct value.

e.g.

service MyService {
    rpc Check(CheckRequest) returns (CheckResponse);
}

message CheckRequest {
    google.protobuf.Struct context = 1;
}
...

I have generated the Go proto sources from this.

In my application I'd like to provide entity limits to bound/limit the size of the context input so as to avoid allowing the client to provide arbitrarily large context input. This is to protect the server resources at runtime. I'm currently using proto.Marshal(request.GetContext()) and then counting the length of the byte array, but this brought to my attention that marshaling the request input also has a cost, and marshaling an arbitrarily large context input also poses a potential runtime risk.

My question is - for applications that want to provide bounds around free-formed inputs like protobuf structs, what is the recommended way of handling such inputs in the Go protobuf world? I'd love to learn more about what others have done.

neild commented 7 months ago

proto.Size will return the size of the wire-encoding of a message.

In the case of an RPC service accepting size-limited requests, the right place to put the size check is before you've parsed the request. If you're calling proto.Size on a value you read from the network, it's already too late; you already spent the time and memory on decoding the value.

You should also be aware that the in-memory representation of a parsed message can be substantially larger than the on-wire representation of the message. Exactly how much more depends a lot on the structure of the message. If you're worried about malicious inputs, it's worth testing edge cases (for example: a message containing a deeply-nested set of map values) to ensure that your system behaves acceptably.

jon-whit commented 7 months ago

@neild when you say

In the case of an RPC service accepting size-limited requests, the right place to put the size check is before you've parsed the request.

I'm assuming what you mean here is that by the time the RPC handler, which implements the underlying gRPC service definition, receives the request struct, then you've already fully decoded the message into memory. Is that correct? And if so, to my understanding, the only way to control that at the server layer is to limit the maximum message size the server will receive (e.g. https://pkg.go.dev/google.golang.org/grpc#MaxRecvMsgSize).

So checking a size constraint on the value of the context input within the RPC handler is a bit of a superfluous check since you've already afforded the decoding step and it's already in-memory. Any further size calculations would just incur even further impact to the server.

jon-whit commented 7 months ago

I realize the conversation above is more specific to gRPC than protobuf, but I was also wondering at the proto layer if there is a more efficient way to check the size as compared to proto.Marshal, and it seems that proto.Size is for sure the better way. I also benchmarked this and it is noticeably faster.

Thanks for entertaining the more gRPC related question(s) as well 👍

neild commented 7 months ago

I'm assuming what you mean here is that by the time the RPC handler, which implements the underlying gRPC service definition, receives the request struct, then you've already fully decoded the message into memory. Is that correct?

Yes, exactly. If you're trying to reject messages that are too big, you want to do it before you invest a lot of effort in parsing the message.

In particular, if your definition of "too big" is proto.Size, then you really should be applying that limit before decoding. You know how large the encoded message is (because you have the encoded message bytes right there), so decoding it only to then spend more effort computing the encoded size is a very long way to go to get information you already had.

stapelberg commented 7 months ago

This question looks answered, so I’ll close.