golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.66k stars 1.58k forks source link

How to reuse []byte field when unmarshaling? #1495

Open ash2k opened 1 year ago

ash2k commented 1 year ago

I have a client/server app that uses gRPC to communicate. Profiler shows that up to 12.6% of memory is allocated in consumeBytesNoZero().

The code is unmarshaling a message that has a bytes (i.e. []byte) field a few levels deep. Then that data is processed and then the message object/struct could be reused to read the next message into if it was possible, but I don't see how. I mean, I can reuse it but looking at the code, a new byte slice will be allocated in any case. The bytes field is always up to N KB in size (client ensures that) so perfect for reuse on the server side. Is there any way to reuse the allocated memory that I don't see?

I see two ways:

Perhaps combining the two is the best way forward - ability to inject dynamic unmarshaling code for types via UnmarshalOptions?

Screen Shot 2022-10-05 at 10 49 47 am
znkr commented 1 year ago

Unfortunately, there is no support for this currently. Generally, the problem you describe is usually solved with areans, but we need to have Go language support for that: https://github.com/golang/go/issues/51317

ash2k commented 1 year ago

I don't understand how an arena allocator would help here, could you elaborate? I just want a way to re-use the allocated memory to avoid more allocations, not to free a bunch of allocations together to save on GC costs. I think it's a different task.

Regardless, hopefully protobuf library doesn't have to wait for a standard library change for this use case. I'm probably not the only user who might benefit from this. Cheers.

dsnet commented 1 year ago

The usage pattern with an arena would look something like:

a := arena.New()
for {
    var m foopb.MyMessage
    proto.UnmarshalOptions{Arena: a}.Unmarshal(&m)
    ... // make use of m
    a.Free()
}

When Unmarshal is called, all allocations will be obtained from the arena rather than the Go heap. The message m must no longer be use once a.Free is called.

The arena proposal is based on real-world data battle testing an implementation of arenas with protobufs. I don't have hard numbers, but I recall that it does dramatically improve performance in some use cases.

ash2k commented 1 year ago

Ok, I see. Thanks for the explanation. This looks like it would solve my problem indeed. However, this seems to be a bit overengineered if you ask me :) I just want to reuse the byte slice that is already in the message, it shouldn't be that hard to do. On the other hand, for more complex messages arenas are definitely a more comprehensive solution (all those small allocations e.g. for nested messages could also be removed).

So, we cannot do anything until there are arenas in Go?

znkr commented 1 year ago

You are right that it's probably not too hard to do, but it's going to be very hard to use correctly because one would need to ensure that the byte slice referenced in the existing proto message is not used anywhere else. It's not enough to make sure that the proto message is not used anywhere else, because some piece of code might retain just the byte slice. It's a dangerous API and that makes me very reluctant to consider it.

Unfortunately, I don't see a simple way to solve your problem with the current API. It's possible that I am missing something though.

puellanivis commented 1 year ago

I’ve had to work on over-allocation issues of byte slices in a separate project that also deals with marshalling and unmarshalling results. There are kind of a multitude of issues and gotchas and it’s really not as easy as one might assume. The point mentioned by znkr is but one of the issues you can run into.

Reading up on arenas, it definitely seems like the more powerful and correct solution, which will help this case here and in that other project. And hopefully that would avoid the accidental pitfalls and footguns surrounding “but I already have a byte-slice right here already.”

aktau commented 1 year ago

The (unstated) goal is (likely) to reduce CPU cycles spent. Allocation and garbage collection are often big contributors to CPU usage in Go programs. It makes sense to try to reduce allocations to reduce CPU usage. But:

I don't understand how an arena allocator would help here, could you elaborate? I just want a way to re-use the allocated memory to avoid more allocations, not to free a bunch of allocations together to save on GC costs. I think it's a different task.

Implies a different goal. Can you state the desired end result more clearly?

About possible solutions: to reduce GC-related CPU usage (and concomittant cache thrashing, write barrier latency et cetera.) arenas have proven quite effective in experiments. Hopefully arenas will become a more widely available thing in the future, but we can also brainstorm other solutions:

  1. Aliasing the input slice: add an unmarshalling option to alias the input slice. If this option is set, then functions like consumeBytes(NoZero) could re-use the bytes from the buffer that is being unmarshalled. This would impose the restriction that the input slice should not be modified until the message is no longer needed. I'm not very familiar with gRPC, but similar frameworks tend to have their own message buffer recycling. As in: (a) take buffer from freelist, (b) read bytes from network into buffer (c) unmarshal object from buffer (d) return buffer to freelist (e) pass object to user handler. In this model, aliasing the input slice wouldn't work, unless gRPC itself was modified to do things differently (only returning the buffer to the freelist when the user handler has returned).
  2. Allocate from user-controlled freelists: instead of allocating and copying a slice, the proto library would call a user-supplied "allocate" function, which could in principle return a []byte slice from some freelist. This is like a poor version of arenas, in some sense, though it also avoids other costs like actual allocation. It has the downside that I don't see how to easily generalize this to other types (like messages), though with some hackery strings might work (and introduce extra unsafety in the program).
  3. Re-use []byte fields from passed-in proto.Message: This is what was suggested in the original post. We cannot do this by default due to many users potentially relying on the old behaviour. But it's conceivable that we add an UnmarshalOption to do this. This would conflict with option 1 (aliasing the input slice), so we'd need to be pretty careful about which one we go for.

For the time being, I think it may be best to wait for arena support, but at any rate I'd be happy to hear your thoughts.

3052 commented 4 months ago

what about clip?

https://godocs.io/slices#Clip

I think that would be better than the current code, as I believe the current code always allocates, and clip would never allocate, at least not inside that function

znkr commented 4 months ago

I don't understand how slices.Clip is relevant here. It doesn't change the aliasing problem.

3052 commented 4 months ago

@znkr OP reported this line as the problem:

https://github.com/protocolbuffers/protobuf-go/blob/v1.28.1/internal/impl/codec_gen.go#L5410

likely because that line is copying a byte slice. in my view thats not needed, you can just use the byte slice directly. the only issue is that if the returned slice has extra capacity, then running append or similar on the result will modify bytes of the parent. Clip solves that by forcing a copy if the slice needs to extend by even one byte.

znkr commented 4 months ago

That's unfortunately not the only issue. The slice can be modified by other means, in fact it's fairly common to reuse a byte slice. Consider this function:

func Parse(in []byte) *MyType

If there return value references in, then code that reuses the underlying buffer is problematic:

var b bytes.Buffer
for range n {
  Receive(&b)
  v := Parse(b.Bytes())
  DoSomething(v)
}

If DoSomething keeps a reference to v (e.g. by starting a goroutine), this code will result in data corruption.

We can't change the default, because that would break existing code and we are reluctant to add a feature that has subtle behavior differences like this based on a configuration.

3052 commented 4 months ago

OK well you might as well close the issue then. you have three options:

  1. use the result of protowire.consumebytes as is
  2. use the clipped result of consumebytes
  3. copy the result of consumebytes

currently option 3 is used, which has obvious performance issues because you forcing a copy regardless of the situation.

aktau commented 4 months ago

As @znkr said, we're currently not considering this due to the potential safety issues. There are two dangerous operations on byte slices that alias the unmarshal input:

  1. Write to the byte slice (this is dangerous, the caller of Unmarshal* needs to be very sure that they do not use the input byte slice of anything, as it is modified).
  2. Append to the byte slice (solved by using slices.Clip, as this will force an allocation).

So issue #1 would still exist even with slices.Clip. Unless we get const byte slices, I don't see how we can fix #1. For strings, it would be less of a problem. As a result, any new option to alias the input (e.g.: AliasInput) would need to come with a fairly strong warning.

The Go proto implementation still has a number of safer optimizations that we could do first, and we're more likely to do those first, stay tuned.