Closed dsnet closed 2 years ago
As noted in https://github.com/golang/go/issues/32424#issuecomment-636492432 an alternative could be to define reflect.AppendInplace
.
always allocates even if the underlying slice has sufficient capacity because it must allocate a slice header on the heap (since it returns a
reflect.Value
with the resized slice).
That seems like a defect in the escape analysis. Shouldn't the reflect.Value
itself be allocated on the stack, and thus able to refer to a stack-allocated slice header?
@bcmills: Maybe? The slice header allocation occurs in reflect.Value.Slice
and escapes to the returned output.
I imagine that in order for the compiler to optimize the allocation of x
away it would need to be able to analyze v.Set(reflect.Append(v, ...))
all together to prove that x
does not escape. I believe it can do this kind of analysis if reflect.Append
is inlineable where the compiler can see that x
ultimately does not escape. However, I don't see how we can make reflect.Append
inlineable as the logic is somewhat complex.
Why does reflect.Append
need to be inlineable in order for the compiler to see that x
does not escape?
AFAICT it just needs the escape analysis to be parametric: “x
does not escape iff the return value from Append
does not escape. (And then we need something analogous to the C++ “return value optimization” so that Append
itself can avoid allocating even if it is not inlined: basically, the compiler should move the allocation decision to the caller side of the ABI.)
I can see roughly how what you describe might work.
basically, the compiler should move the allocation decision to the caller side of the ABI.)
Is this a decision made at compile time or run time?
I'm assuming it would be made at compile time, although in principle it could be made at run time (at the cost of some branch-predictor resources).
Is there an issue for such an optimization? I recall #18822, which is somewhat related, but about escape analysis of input argument (rather than return arguments), and also seems to suggest a compile-based approach whether the function is compiled in two ways, one that assumes the argument escapes, and one where it doesn't.
In general, an optimization of this nature sounds complex, and historically (and understandably) such changes take a long time to land (if ever). I'm certain that it will provide wide benefit to most Go code. On the other hand, an API change as suggested here can provide benefit within a single Go release at the cost of newly added API surface.
I just found out about https://blog.filippo.io/efficient-go-apis-with-the-inliner/, it might be worth seeing if we can use that technique in this situation.
Writing x.Append(y) would be strange, since append(x, y) does not do what you would hope. We already have reflect.Append. I'm skeptical about designing redundant APIs just to remove allocations.
I'm skeptical about designing redundant APIs just to remove allocations.
I'm a little confused; that sentiment seems to go against the recently accepted MapIter.Reset
, MapIter.SetKey
, and MapIter.SetValue
APIs.
What about:
// v must be a slice, and v.CanSet() must return true
func (v Value) MakeSlice(len, cap int)
I believe this would address the remaining issues that prevent writing extra-allocation-free (de)serialization code.
Code that knows the length could allocate and decode the slice in-place (for example, to a struct field), instead of boxing the header with reflect.MakeSlice()
.
Code that doesn't know the length could append to a buffer of reflect.Value
s, then make a slice of the appropriate length and set the indices from the values. The buffer of reflect.Value
s could potentially be re-used by the decoder to amortize the cost.
@matt0xFF: Interesting idea. It's not clear to me whether MakeSlice
would copy all the existing elements. An alternative would be to add a Grow
method:
// Grow extends the capacity of the current slice by at least n elements.
// The resulting capacity may exceed the current capacity plus n.
// The value must be a settable slice.
func (v Value) Grow(n int)
@dsnet: The next step is presumably setting the values via Index(n).Set(v)
, so I think it would be better for the new method to increase the length by default rather than just the cap.
I don't know if Grow()
has a connotation either way, if it usually refers to the cap there's always Resize()
.
Other than that I like it better, it solves the same issue without being as redundant with the current API.
It looks like Grow()
is used to extend the cap in the new generic slices package (#45955). Maybe it could be called Resize()
then?:
// Resize sets the slice length to len. If len is greater than the capacity, the existing
// elements will be copied to a new underlying array. Any new trailing elements will be
// set to the zero value of the element type. It panics if v is not a settable slice.
func (v Value) Resize(len int)
In terms of the slice capacity, it could behave like this (of course without all the extra allocations):
if needed := len-v.Len(); needed > 0 {
emptySlice := reflect.MakeSlice(v.Type(), needed, needed)
newSlice := reflect.Append(v, emptySlice)
v.Set(newSlice)
}
Looking at the current implementation, reflect.Append()
does not allocate extra capacity when the first argument is a nil
slice. If that remained the same, then reflect-based decoders that know the length could replace:
newSlice := reflect.MakeSlice(sliceV.Type(), count, count)
sliceV.Set(newSlice)
with
sliceV.Resize(count)
Which would be equivalent and avoid boxing the slice header.
For decoders that don't know the length, this code:
for moreObjects {
newValue := decoder.decodeValue()
appendResult := reflect.Append(sliceV, newValue)
sliceV.Set(appendResult)
}
could be replaced with:
for count := 1; moreObjects; count++ {
sliceV.Resize(count)
newElem := sliceV.Index(count-1)
decoder.decodeToValue(newElem)
}
which would avoid boxing (the slice header and value each time through the loop) in that path too.
I'm not sure I understand, my proposed Value.Grow
method is identical to the slices.Grow
function.
@dsnet Yes, I was just following up on my earlier comment. Since decoders will need to use Index() to access the new value, it seems like it would be better for the new function to increase the length unlike Grow()
, otherwise they'll always need to follow it with SetLen()
.
@dsnet
I'm not sure I understand, my proposed Value.Grow method is identical to the slices.Grow function.
I don't see .Grow in this issue except in that comment. Did I miss something? Is the proposal Append or Grow?
Is the proposal Append or Grow?
Either. I'm primarily interested in an allocation-free way to do the equivalent of:
v.Set(reflect.Append(v, e))
With reflect.Value.Append
, the above would trivially become:
v.Append(e)
With reflect.Value.Grow
, the above would become:
n := v.Len()
if n == v.Cap() {
v.Grow(1)
}
v.SetLen(n+1)
v.Index(n).Set(e)
I have preference for reflect.Value.Append
over reflect.Value.Grow
.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
The problem with Append()
is it requires boxing (to a reflect.Value
) and copying to add non-pointer types to a slice. Grow()
doesn't have this problem. For example when decoding this type:
type MyStruct struct {
samples []float32
}
With Append()
, every float32
in the slice will be individually allocated (to box to a reflect.Value
), then copied to the destination slice. With a Grow()
or Resize()
method decoding can be done with this pattern:
for count := 1; moreObjects; count++ {
sliceV.Resize(count)
newElem := sliceV.Index(count-1)
decoder.decodeToValue(newElem)
}
Which could decode non-pointer types in-place, without extra allocating or copying. That should be true across numbers/structs/arrays.
A more minor issue with Grow()
is that using Index()
next requires the slice length be extended. For a new reflect API it would be more convenient and efficient to extend the length at the same time (perhaps renaming the new method to Extend()
, since the semantics would now be different from slices.Grow()
).
All these API variations seem simple enough, so I think it would make sense to eliminate as many performance issues as possible so there's no need for additional methods in the future.
It sounds like there are some good reasons to do Grow instead of Append. Are there objections to that?
What about this signature maybe?
// Extend increases v's length by n and returns the first new element.
// The elements are copied to a new backing array if the capacity would be exceeded.
func (v Value) Extend(n int) Value
Then the Grow() example would go from:
n := v.Len()
if n == v.Cap() {
v.Grow(1)
}
v.SetLen(n+1)
v.Index(n).Set(e)
to
v.Extend(1).Set(e)
Which is not much more verbose than the original Append()
, while not having an issue on the other use-cases.
It is less verbose, but returning the first element is strange. For example seeing:
v.Extend(10).Set(e)
does not immediately look sensible.
v.Extend(10).Set(e) does not look immediately look sensible.
That's true, I imagine the return value would only be used with Extend(1)
. Theoretically you could have separate methods to extend by 1 or N (with the latter having no return value), but in this case perhaps it's fine to just have one method that can be used two ways, since it's easy enough to ignore the return.
Maybe a small doc snippet showing the intended usage might be helpful, something like:
// Equivalent to s = append(s, v)
s.Extend(1).Set(v)
We expect there to be a standard slices package with a slices.Grow(n) at some point in the future, probably Go 1.19. It doesn't make sense to deviate from that signature now, and the chaining of s.Extend is a bit unusual for the reflect package anyway.
It seems like Grow is the right thing to add. It's too late for Go 1.18, so this would wait for Go 1.19 anyway.
Does anyone object to Grow?
Just to mention a specific example before it gets baked in: the only two usages of reflect.Append()
across the json
/xml
/gob
packages are of the form:
l := v.Len()
v.Set(reflect.Append(v, reflect.Zero(v.Type().Elem())))
decodeTo(v.Index(l))
This could be converted to:
decodeTo(v.Extend(1))
or
l := v.Len()
v.Resize(l+1)
decodeTo(v.Index(l))
or
l := v.Len()
if l+1 > v.Cap() {
v.Grow(1)
}
v.SetLen(l+1)
decodeTo(v.Index(l))
For its intended usage, Grow()
seems more error-prone (and slightly less efficient) than several alternatives. If having a similar signature to a function from slices
outweighs that, I don't have any objection. I just wanted to point out (after recently writing a lot of reflect-based code) that slices.Grow()
doesn't end up being an ideal API here since reflect
is used differently.
It seems like you would write:
n := v.Len()
v.Grow(1)
v.SetLen(n+1)
decodeTo(v.Index(n))
This seems OK, and it will match slices.Grow eventually.
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
I think I understand now, the Grow()
function and example code mentioned earlier in the thread was subtly different from the one proposed for slices
(which doesn't always extend the cap, only conditionally based on free space). Using the one from slices just requires an extra SetLen()
which is reasonable.
The only remaining issue I can think of is if Grow(0)
on a nil slice should remain nil or set it to a non-nil empty slice - I think there is a place in the json
package that needs to create those.
The only remaining issue I can think of is if Grow(0) on a nil slice should remain nil or set it to a non-nil empty slice
Given that append([]T(nil))
returns a nil slice, I would expect the same for Grow(0)
.
@dsnet I agree that not modifying the slice probably meets user expectations better (and it's already possible to avoid that allocation with a small amount of unsafe code). In any event it should probably copy whatever slices.Grow()
does.
While unsubmitted, the draft implementation of slices.Grow
does have that behavior. See https://go-review.googlesource.com/c/go/+/355253/5/src/slices/slices.go#202
We definitely want to make slices.Grow and this Grow match as much as possible. We can work that out in review.
No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group
Change https://go.dev/cl/389635 mentions this issue: reflect: add Value.Grow
The current
reflect.Append
function unfortunately always allocates even if the underlying slice has sufficient capacity because it must allocate a slice header on the heap (since it returns areflect.Value
with the resized slice).I propose the addition a new method on
reflect.Value
:that is semantically equivalent to:
Such a method can avoid the allocation for a slice header because it never gets exposed to the application code. The implementation can update the underlying slice header in place.
Performance benefits aside, there is a readability benefit as well:
For consistency, we would also add
reflect.Value.AppendSlice
.Prevalence
Using the public module proxy, there are ~12k usages of
reflect.Append
orreflect.AppendSlice
:v.Set(reflect.Append(v, ...))
. All of these can be switched tov.Append(...)
.v = reflect.Append(v, ...)
. It is unclear whether v is addressable or whether the use case cared about the originalv
being updated. Manual inspection showed that some of these could switch tov.Append(...)
.Thus, up to 82% of all
reflect.Append
usages could benefit fromreflect.Value.Append
.