protoc-gen-go: support generating structs with non-pointer message fields

Is your feature request related to a problem? Please describe.

An important property of go is the ability to compose structs in order to reduce the number of allocation and improve access locality. This is at odds with languages like Java which (until project Valhalla) required that each instance live independently on the heap. Java's runtime specialization and JIT compilation helps it to mitigate that overhead. Protobuf fields (as of proto3, at least) can be omitted in serializations, a very important property for forwards compatibility. In order to support the optional nature of fields, protoc-gen-go generates fields as pointers. This (combined with storing unknown fields) allows a message to round-trip through deserialization back to serialization without losing information.

The language guide indicates specific zero values for types other than messages which permits the compiler to use non-pointer types for fields of those types.

For message fields, the field is not set. Its exact value is language-dependent. See the generated code guide for details.

Consider the following contrived example:

message Person {
  message Name {
    string first = 1;
    string last = 2;
  }

  Name name = 1;

  repeated Person friends = 2;
}

This will generate the below struct:

type Person struct {
        state         protoimpl.MessageState
        sizeCache     protoimpl.SizeCache
        unknownFields protoimpl.UnknownFields

        Name    *Person_Name `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"`
        Friends []*Person    `protobuf:"bytes,2,rep,name=friends,proto3" json:"friends,omitempty"`
}

This issue considers two different cases in which the protoc-gen-go compiler generates struct pointer types as opposed to struct value types in generated structs: message fields, collection fields which utilize messages (repeated, map). The above generated struct is unfortunate for two reasons.

The first is that the Name field has a pointer. As discussed above, this, today, is important so that upon marshaling the library can know whether to serialize a zero-value message for the field or to not serialize the message at all. In the solutions section we'll discuss how we can store information as to whether this field is populated elsewhere in the Person struct.

The second is that the Friends field is a []*Person rather than []Person. The motivation for this decision is not clear to me. It is illegal for an entry in that slice to be nil for the purposes of serialization. Calling proto.Marshal with a nil entry will result in an error proto: repeated field Friends has nil element.

Describe the solution you'd like

The proposal here is for messages and fields to have an option, which for now we'll call (go.nullable) which can be set to false to indicate that pointers should not be generated. For repeated and map fields this should be trivial to implement and does not seem to break any semantics. For regular message fields, we'll need to have some mechanism to (a) determine at serialization time whether a field is zero-value but exists or should not be serialized at all (b) to mark a zero-valued field as not existing when creating a message in memory.

For (a) I propose that we augment the unexported state of messages to include a bitmask to indicate whether a field is missing. At serialization time, if the library detects that a message has a zero value, it will consult this bitmask. The previous sentence is intentionally imprecise. In terms of implementation details, in the new google.golang.org/protobuf world, we have some flexibility to control the implementation of protoreflect.Message. What is critical is that when a "root" message exposes its fields through the Range method, that it provide Value implementations which return the proper value for IsValid. This can be done by constructing the relevant Message values by telling them whether they are valid.

For (b) I propose generating a method for each message field which is generated as a non-pointer to set its validity. Note that the validity only applies if the value of the message is zero. This will mean that in this case, IsValid will be more expensive as it will need to determine whether all fields of a given message are zero before consulting its validity bit.

Describe alternatives you've considered

One discussion related to this has been to allow the protobuf library to pool allocations rather than to try to eliminate those allocations: https://github.com/golang/protobuf/issues/1192

Additional context

This topic has been discussed at least twice before. In https://github.com/golang/protobuf/issues/1142 the discussion hinged upon internals to the implementation rather than the more abstract problem. That discussion went nowhere. I re-raised this issue in the context of https://github.com/golang/protobuf/issues/52#issuecomment-685904437 which was the incorrect place to do so.

Another important thing to note is that the demand for this functionality is not theoretical. This is deemed as the most critical improvement of the now defunct https://github.com/gogo/protobuf/ library by engineers on https://github.com/cockroachdb/cockroach. Cockroach seems to not be the only project which is struggling to move off of that deprecated and unsupported fork (https://github.com/firecracker-microvm/firecracker-containerd/issues/452).

The second is that the Friends field is a []*Person rather than []Person. The motivation for this decision is not clear to me.

We consistently represent proto messages as a pointer to a Go struct. A *Person is a proto.Message, but a Person is not.

It is not, in general, safe to copy a proto message by value. Messages often contain a mutex, which cannot be copied. In the latest version of the code generator, messages embed a pragma.DoNotCopy to discourage shallow copies.

Given that messages shouldn't be shallow copied, there's no safe way to append a message to a []Person.

I understand the motivation for this request, but the choice to represent messages as pointers was made long ago and is not something that can be easily undone.

It is illegal for an entry in that slice to be nil for the purposes of serialization. Calling proto.Marshal with a nil entry will result in an error proto: repeated field Friends has nil element.

This is not true as of v1.4.0; a nil element of the repeated field slice is treated as an empty message.

I understand the motivation for this request, but the choice to represent messages as pointers was made long ago and is not something that can be easily undone.

I acknowledge that the path to a resolution here is not straightforward or obvious, but that doesn't mean we shouldn't try or that the decision was correct. This is a show-stopping design flaw in the ecosystem as far as I am concerned.

I hear your points and don't contest any of them. I understand your hesitation to drive forward what will undoubtedly be a large effort. I've created this issue in order to collaborate to move the community forward onto the decidedly superior library you and many other have worked hard to create. I am regularly very sad about the fact that I cannot leverage this great work.

However, the performance loss due to the allocations would be unacceptable. At this point the message the go community is hearing is that protobuf and grpc by extension does not care about supporting idiomatic performance sensitive applications. That seems somewhat farfetched given the common discussion about grpc's efficiency. I've been through the new APIs and think that from a usability perspective, roughly all of the gogoproto extensions can be worked around (with varying degrees of pain) other than this one. Composing structs to avoid allocation is a critical language feature.

Perhaps it will take years to resolve this. Perhaps we won't be satisfied with proposals without generics which is certainly going to be years away. I, for one, will be very sad if we just let this sit there and the serious go community continue to ignore the obviously superior APIs and continues to be left behind by the ecosystem which will continue to evolve around the new APIs.

Thanks for taking the time to reply. I understand the way in which challenging and unwarranted requests on open-source projects can feel taxing in light of all of the other things going on.

Coming back to technical details, your point regarding shallow copying is well taken. I would like the revisit the decision to embed mutexes inside of generated message structs. I'm not familiar with where that is done. Philosophically I find it surprising given that the messages represent just plain old data. What use case demanded synchronization below rather than above the message?

Your comment about the struct not implementing proto.Message seems to me like a non-issue. If users need a proto.Message from a repeated element, then they can get one by taking the address of the slice entry. Generated code can certainly deal with taking references where required. Obviously users will need to be conscious of the possibility that slice mutations may clobber data inside such a pointer, but that's true generally when dealing with slices of structs and taking addresses.

If these mutexes were not embedded inside of message implementations, what would prevent shallow copying or generally impede forward progress of a design in this direction?

A []Person is inherently already a pointer semantic, in that none of the Person structs contained in it would be actually embedded in the parent message, but rather are still allocated separately, and still referenced by pointer.

Discussions about permitting them to be allocated as a single chunk, rather than piecemeal can sometimes be made, but unfortunately, protobuf code cannot practicably figure out ahead of time just how many repeated message of a field will actually appear when decoding, especially given the extremely diverse set of possible values for that message’s length from a single byte, up to 1024 bytes or more. If we’re talking here about “performance sensitive applications” then we all already know that even a small slice growth can cause an extremely bad performance drop:

list := make([]Person, 0, 1024)
for i := 0; i < 1024; i++ {
  list = append(list, Person{})
}
list = append(list, Person{}) // Oh sir, it’s only wafer thin.

The use of mutexes in the messages is principally for record keeping used in reflection, and in fast-path encoding implementations. Extracting all of that functionality seems infeasible at this point, and would likely cause worse performance regressions than the purposed gains from using embedded objects rather than pointers.

Most proposals to generate message fields as simply M instead of *M disregard protobuf correctness which requires that presence be preserved for these field. I appreciate that your proposal doesn't throw correctness out the window and attempts to preserve that behavior.

Several thoughts:

Regarding the representation of repeated message fields: the performance of []M or []*M is highly dependent on at least 3 factors: the size of the message, the number of slice elements, and whether the length is known beforehand. In the cases where the message sizes are on the larger side and the number of elements is non-trivial, it increasingly becomes more performant to use []*M over []M. The main reason why []M may be more performant is the ability to batch allocate the messages, but you can also do that with []*M by allocating a separate []M and addressing each element into the former. See this benchmark for example; try varying the message size and the number of elements.
- Also, memory allocations are also not the only metric we care about. We also care about memory usage. Relying on []M means that the entire []M will stay alive in memory so long as there exist one reference to a *M element within the slice. On the other hand relying on []*M means that all other elements may potentially be garbage collected except the ones where there exist an explicit reference to it.
Regarding the representation of map fields with messages: it is generally going to be less performant to use map[K]M over map[K]*M. The values in a map are not addressable. In the implementation of protobufs themselves, and also the wider ecosystem that deals with protobuf messages, it is very common for a concrete message to go through a proto.Message interface. Since map values can't be addressed, the value will have to end up on the heap each time it goes through a proto.Message interface.
Regarding "solution b" where we "[generate] a method for each message field ... to set its validity". Let's suppose we generate HasField and ClearField methods for a field named Field with go.nullable=false. What happens when there is another field that happens to be exactly named HasField or ClearField? Do we simply not generate the methods? Do we automatically mangle the method name to be unique? Do we generate invalid code to indicate to the user that the combination of options and field names is not permitted? (this is not a hypothetical issue, but a real issue that is encountered at scale).

The use of mutexes in the messages is principally for record keeping used in reflection, and in fast-path encoding implementations. Extracting all of that functionality seems infeasible at this point, and would likely cause worse performance regressions than the purposed gains from using embedded objects rather than pointers.

In this is something very important: the motivation for this is the idea that the reason why many (most?) major go programs which use grpc/protobuf shun the official library is performance. Others in other forums have raised the point that regardless of the allocation concerns, the reflection-based decoding is dramatically less efficient than code-generated decoding. I’ve become convinced that we need some benchmarks on the topic. If indeed there’s more efficiency lost to the reflection based [de]serialization than the allocations, then that’s the conversation to be having first. On some level at the heart of this issue is a question as to whether the maintainers care that major projects deem this library to be unusable inefficient. Furthermore, the question is whether there is will to collaborate with the community to understand and rectify the problem in order to eliminate the widespread and growing fragmentation.

In response to @dsnet

Yes, you are correct that the shape of messages matters a lot. In the scenarios which concern me, the messages involve regularly hundreds but sometimes tens of thousands of elements and each of those messages may itself contain on the order of 10s of message fields (counting recursively). If all of those were pointers we might be talking about many thousands of additional allocations per rpc. For what it’s worth, I’m in no way suggesting that this become the default. Users should opt into it. I think this opt-in behavior puts the onus of memory lifetime tracking onto the user.
Your map point is totally valid and I’m not sure why I made it. Consider it retracted.
The point about the method name collision feels both valid but something we can work around. For one, it seems reasonable to stick this bit on the MessageState which gets embedded in the generated message struct. We could then interact with it through a new method on the protoreflect.Message or something like that. It feels, to me, like something we could overcome if we chose.

Others in other forums have raised the point that regardless of the allocation concerns, the reflection-based decoding is dramatically less efficient than code-generated decoding. I’ve become convinced that we need some benchmarks on the topic.

Several thoughts:

The current implementation only uses reflection initially to construct the (de)serialization tables up-front, and all subsequent operations use what we call the "table driven" implementation. It is inaccurate to say that it is "reflection-based decoding".
We are aware that full code generation can be more performant, but:
1. Full code generation leads to significant code bloat which leads to certain programs not being buildable since they have restrictions regarding how the binary is built or where it is deployed. It cannot be the default mode.
2. Full code generation unfairly favors micro-benchmarks more than realistic end-to-end benchmarks. Code generation highly benefits from the generated (de)serialization code being in the instruction cache. For a sufficiently large program, the benefits of full code generation are often not as great as seen on micro-benchmarks. In some rare cases, we have actually seen it have the opposite effect of slowing down the program since the instruction cache starts to thrash heavily.
Let's keep any further discussion about code generation on #280. It's not something we're opposed to adding, but it is a non-trivial amount of work and there's only so much we can do.
- That said, since the motivation for this proposal is performance, I think it's more practical if we invest effort into #280 first if we believe it will benefit users that want to opt-in to having it. This proposal is much harder to accept than #280 since it requires API-level changes, while #280 does not.

Huh. It just occurred to me something that I was always kind of already aware of, but had not really considered directly as a design principle involved here: protobufs kind of expect and direct you towards using flatter messages, rather than deeply nested messages. Just because of the overhead of embedding a small message in both time, allocations of memory, and compaction of binary output.

But a lot of people are coming from entirely different (but also equally valid) design principles that say that hierarchies are good, and add form and context to a data structure.

So, I think this might be one of those cases where someone is asking for a feature, when what they really want is efficient handling of their use case, and the feature would just be one way to accomplish that end goal.

I think approaching this situation as “some programs, and design principles are going to have highly nested protobufs, what can we do to not make that suck?” Might get us somewhere faster than focusing heavily on the intricate details and bogged down on specifics of “how can we get embedded structs working?” I mean, maybe there just isn’t a better answer than embedded structs. But being able to scope the use case properly might get us to an answer more quickly.

I think approaching this situation as “some programs, and design principles are going to have highly nested protobufs, what can we do to not make that suck?”

There's something nice in this sentiment but I'm not sure how much it applies. I certainly respect your point regarding not getting lost in the technical specifics but rather focusing on the problem at hand. However, my driving motivation here is desire to move https://github.com/cockroachdb/cockroach to the new v2 APIs and my perceived primary blocker is the efficiency loss. I don't think advising large projects to rework their messages is the answer.

Go, as a langauge, prefers being explicit, and, as I see it, exploiting composition to represent that explicit nature. Dealing with parallel slices in go is laborious at best. If I take your claim to the extreme, maybe what you're getting at is that the idea of using structures which directly represent the layout of the protocol buffer as a go struct is unadvisable and we should be talking about something even more extreme. Perhaps a proposal direction that might interest you would be some sort of declarative syntax for mapping protocol buffer messages to go structs which have a dramatically different layout. That feels far fetched.

I guess what I'd say is that while I appreciate the sentiments of the message on a high level, I don't know how to act on them or use them to guide this conversation towards a positive outcome.

I understand your intended motivation: move cockroachdb to the new v2 API. But the original V1 api, and the V2 api both do not support deeply nested messages well at all. (And neither does the C++ or Java or python(?) implementations) Which is probably what drove cockroachdb to use gogoprotobuf in the first place.

I am not proposing that cockroachdb rework all their messages to flatten them. That is obviously unreasonable, intractable, and hostile. But approaching a topic with, “I must have this feature” rather than “this is the goal I want to achieve” is often going to focus people on the why and how that specific feature can or cannot work, rather than focusing on your actual goal. You and I may not see a viable alternate solution here, but were not the entire set of all people open to this project. By framing this as the only solution acceptable, we have largely only ever been focusing on the narrow example use cases, where the actual issues at had do not manifest. Then proposed Person and Name messages simply do not demonstrate the performance concerns being faced. Two allocations rather than one… 🤷‍♀️ why is that bad?

So, we have up until now only ever been touching on the most absurdly basic surface level of the feature, rather than on the deeper issue: “when I work with my messages it has to allocate 100s, or 1000s of submessages”. So, instead of addressing the problem of the performance of highly nested messages, we’ve focused instead on why slices and maps really need to use pointer values, rather than anything that actually matters to you or others.

Now, I know a lot about Go and I know a lot about protobufs, but I’m not The One True Expert that can say definitely what needs to be done in order to solve your actual problem. What I am absolutely able to do is poke holes in the feature and the minimal repros demonstrating the feature and why they’re unnecessary at the scale suggested, because that what engineer do.

Being able to see the problem at its deeper levels: many projects are using highly nested messages, and switching to this package will cause unreasonable performance loss, is an entirely different argument than “protobuf and grpc by extension does not care about supporting idiomatic performance sensitive applications.” Google is either using this package already extensively internally, or using a copy of this code and yet they’re still writing performance critical applications. So when you say things like that, we’re at an absolute loss of how to take it, because… like… but… we’re doing that… so what’s the difference?

Without the key lynch pin of “many projects are using highly nested protobuf messages” many of us are not even going to understand what the problem is, because we simply are not ever getting anywhere near that situation.

So… all that said. Let’s look at a way forward for these projects with an open-mind to all possible solutions rather than laser focusing on one purposed solution that many of us may not have even ever seen the merit of, because we have never been anywhere near the problem issue in the first place.

I really don't see any possibility of us changing to or adding a representation that uses value semantics for messages. The design decision that messages have reference semantics is, so far as I know, pervasive across all protobuf implementations. It's certainly pervasive within the Go implementation.

Okay. I acknowledge that my approach in this issue has been myopically focused on the issue of value types in my more general attempt to move cockroach, and ideally other large, existing projects towards the V2 API. I view the current schism over implementations as a major blocker to adoption of the new API and thus to investment in its ecosystem by the sizable number of contributors working on these large projects.

Is there appetite for an issue to more generally discuss and focus on the v1 vs. v2 API schism and the properties of gogo/protobuf that block its adoption?

There was an alternative considered, which is to facilitate better object pooling, which I'll now investigate in more depth. #1192 does not directly talk about pooling and maybe I'm wrong to read in the idea that pooling is related there. I'll propose, however, that having a sync.Pool for each message type, at least each message type in use, would likely help with the allocation overhead at the cost of tracking lifecycles and walking messages to return their members to the pool.

I anticipate #280 is also going to be a part of allaying fears of moving to this library from gogo/protobuf.

golang / protobuf

protoc-gen-go: support generating structs with non-pointer message fields #1225