Open ash2k opened 1 year ago
Unfortunately, there is no support for this currently. Generally, the problem you describe is usually solved with areans, but we need to have Go language support for that: https://github.com/golang/go/issues/51317
I don't understand how an arena allocator would help here, could you elaborate? I just want a way to re-use the allocated memory to avoid more allocations, not to free a bunch of allocations together to save on GC costs. I think it's a different task.
Regardless, hopefully protobuf library doesn't have to wait for a standard library change for this use case. I'm probably not the only user who might benefit from this. Cheers.
The usage pattern with an arena would look something like:
a := arena.New()
for {
var m foopb.MyMessage
proto.UnmarshalOptions{Arena: a}.Unmarshal(&m)
... // make use of m
a.Free()
}
When Unmarshal
is called, all allocations will be obtained from the arena rather than the Go heap. The message m
must no longer be use once a.Free
is called.
The arena proposal is based on real-world data battle testing an implementation of arenas with protobufs. I don't have hard numbers, but I recall that it does dramatically improve performance in some use cases.
Ok, I see. Thanks for the explanation. This looks like it would solve my problem indeed. However, this seems to be a bit overengineered if you ask me :) I just want to reuse the byte slice that is already in the message, it shouldn't be that hard to do. On the other hand, for more complex messages arenas are definitely a more comprehensive solution (all those small allocations e.g. for nested messages could also be removed).
So, we cannot do anything until there are arenas in Go?
You are right that it's probably not too hard to do, but it's going to be very hard to use correctly because one would need to ensure that the byte slice referenced in the existing proto message is not used anywhere else. It's not enough to make sure that the proto message is not used anywhere else, because some piece of code might retain just the byte slice. It's a dangerous API and that makes me very reluctant to consider it.
Unfortunately, I don't see a simple way to solve your problem with the current API. It's possible that I am missing something though.
I’ve had to work on over-allocation issues of byte slices in a separate project that also deals with marshalling and unmarshalling results. There are kind of a multitude of issues and gotchas and it’s really not as easy as one might assume. The point mentioned by znkr is but one of the issues you can run into.
Reading up on arenas, it definitely seems like the more powerful and correct solution, which will help this case here and in that other project. And hopefully that would avoid the accidental pitfalls and footguns surrounding “but I already have a byte-slice right here already.”
The (unstated) goal is (likely) to reduce CPU cycles spent. Allocation and garbage collection are often big contributors to CPU usage in Go programs. It makes sense to try to reduce allocations to reduce CPU usage. But:
I don't understand how an arena allocator would help here, could you elaborate? I just want a way to re-use the allocated memory to avoid more allocations, not to free a bunch of allocations together to save on GC costs. I think it's a different task.
Implies a different goal. Can you state the desired end result more clearly?
About possible solutions: to reduce GC-related CPU usage (and concomittant cache thrashing, write barrier latency et cetera.) arenas have proven quite effective in experiments. Hopefully arenas will become a more widely available thing in the future, but we can also brainstorm other solutions:
consumeBytes(NoZero)
could re-use the bytes from the buffer that is being unmarshalled. This would impose the restriction that the input slice should not be modified until the message is no longer needed. I'm not very familiar with gRPC, but similar frameworks tend to have their own message buffer recycling. As in: (a) take buffer from freelist, (b) read bytes from network into buffer (c) unmarshal object from buffer (d) return buffer to freelist (e) pass object to user handler. In this model, aliasing the input slice wouldn't work, unless gRPC itself was modified to do things differently (only returning the buffer to the freelist when the user handler has returned).For the time being, I think it may be best to wait for arena support, but at any rate I'd be happy to hear your thoughts.
what about clip?
I think that would be better than the current code, as I believe the current code always allocates, and clip would never allocate, at least not inside that function
I don't understand how slices.Clip
is relevant here. It doesn't change the aliasing problem.
@znkr OP reported this line as the problem:
https://github.com/protocolbuffers/protobuf-go/blob/v1.28.1/internal/impl/codec_gen.go#L5410
likely because that line is copying a byte slice. in my view thats not needed, you can just use the byte slice directly. the only issue is that if the returned slice has extra capacity, then running append
or similar on the result will modify bytes of the parent. Clip solves that by forcing a copy if the slice needs to extend by even one byte.
That's unfortunately not the only issue. The slice can be modified by other means, in fact it's fairly common to reuse a byte slice. Consider this function:
func Parse(in []byte) *MyType
If there return value references in
, then code that reuses the underlying buffer is problematic:
var b bytes.Buffer
for range n {
Receive(&b)
v := Parse(b.Bytes())
DoSomething(v)
}
If DoSomething
keeps a reference to v
(e.g. by starting a goroutine), this code will result in data corruption.
We can't change the default, because that would break existing code and we are reluctant to add a feature that has subtle behavior differences like this based on a configuration.
OK well you might as well close the issue then. you have three options:
currently option 3 is used, which has obvious performance issues because you forcing a copy regardless of the situation.
As @znkr said, we're currently not considering this due to the potential safety issues. There are two dangerous operations on byte slices that alias the unmarshal input:
Unmarshal*
needs to be very sure that they do not use the input byte slice of anything, as it is modified).slices.Clip
, as this will force an allocation).So issue #1 would still exist even with slices.Clip
. Unless we get const
byte slices, I don't see how we can fix #1. For strings, it would be less of a problem. As a result, any new option to alias the input (e.g.: AliasInput
) would need to come with a fairly strong warning.
The Go proto implementation still has a number of safer optimizations that we could do first, and we're more likely to do those first, stay tuned.
I have a client/server app that uses gRPC to communicate. Profiler shows that up to 12.6% of memory is allocated in
consumeBytesNoZero()
.The code is unmarshaling a message that has a
bytes
(i.e.[]byte
) field a few levels deep. Then that data is processed and then the message object/struct could be reused to read the next message into if it was possible, but I don't see how. I mean, I can reuse it but looking at the code, a new byte slice will be allocated in any case. Thebytes
field is always up to N KB in size (client ensures that) so perfect for reuse on the server side. Is there any way to reuse the allocated memory that I don't see?I see two ways:
Unmarshal()
to the message so that I could add custom unmarshaling logic or something like that. Ideally not a fixed method, actually, but a way to dynamically inject custom logic for a particular type.UnmarshalOptions
specially for that case.Perhaps combining the two is the best way forward - ability to inject dynamic unmarshaling code for types via
UnmarshalOptions
?