Open martisch opened 5 years ago
Change https://golang.org/cl/188499 mentions this issue: runtime/pprof: add interface to reduce number of allocation when setting labels
Just to summarize here to avoid needing to load external links, right now the API is:
type LabelSet struct {
// Has unexported fields.
}
func Labels(args ...string) LabelSet
func WithLabels(ctx context.Context, labels LabelSet) context.Context
This requires making a slice to pass to Labels, and that slice escapes (and is variable sized) so it must be heap allocated.
The proposed interface in the CL is to add WithLabelsFromMapper(context.Context, LabelMapper), where LabelMapper is:
// LabelMapper is a set of Len() number of labels that can apply a function to all of the contained labels.
type LabelMapper interface {
Len() int
Map(f func(key, value string))
}
So you have to make a LabelMapper and then the context package calls its Len and Map methods to retrieve the labels. But if you already have the key-value pairs in your own data structure, you avoid allocating the converted slice.
They still get copied into the context in some form, though, so you've cut the allocations by at most 50%, not 100%.
Is there a simpler or cleaner API? Is Len really necessary?
/cc @hyangah @pjweinbgo @matloob
They still get copied into the context in some form, though, so you've cut the allocations by at most 50%, not 100%.
I do not think the proposal anywhere claims to lower down the usage to no allocations. I understand that saying there is opportunity to potentially save two allocations when this does only one of them in the worst case can lead to infer that this was misleading. That was not my intention. Note that there are usually at least 3 allocations involved: 1) from internal census representation (e.g. a map or slice of structs) to a []string to pass to pprof.Labels 2) another in pprof.Labels to construct a new slice for the LabelsSet 3) in WithLabels to create a map to hold the labels
The usual case I have seen is therefore should saving at least 2 of 3 allocations since most uses do not use []string as their native representation of key value pairs that can be passed as is to pprof.Labels.
Is there a simpler or cleaner API? Is Len really necessary?
The Len allows to pre-size the internally created map to usually hold all items from parent and child context with the initial allocation as another optimisation. Profiling shows that some map growth is made in this function which seems to account for ~30% of the time in WithLabels.
for _, label := range labels.list {
childLabels[label.key] = label.value
}
ping to experts. Would be nice if this could be resolved before go1.14 enters the freeze period. /cc @hyangah @pjweinbgo @matloob
I think @matloob knows the history of this API design and the plan to optimize LabelsSet and WithLabels better.
My impression was - it was originally designed to support census and it was uncommon to see a large number of new tags (labels) added at once. (usually one or two additional tags added per Do call except when a server is creating the new context based on the tags from wire). Maybe the trend is changed now?
The runtime/pprof.Do
is what the package recommends for users to use, so this needs a variance of Do that works with LabelMapper.
Ping @pjweinbgo @matloob for any thoughts. Needing to make a Do does make this more complex.
I don't think that we need to have a variant of Do. WithLabels is meant to be a lower level interface than Do, and the WithLabelsFromMapper looks like just a more efficient replacement that an alternative to Do should use. I think it would be valid for OpenCensus to have its own variant of Do that calls WithLabelsFromMapper instead. As long as the OpenCensus variant of Do sets and unsets labels the same way Do does, code that uses pprof's Do and code that uses OpenCensus's Do should be compatible.
The way I thought about Do originally was that we'd make Do available for users who weren't using census or something similar, and other libraries would provide their own Dos that set the labels on the context and called WithLabels at the same time. It looks like this change provides another more efficient alternative to WtihLabels, so it seems like it fits in fine.
This sense of "map" - meaning apply a function to a data structure - is not one we've used much in Go to date (strings.Map is the exception). I'm also bothered by needing both Len and Map. Would it really be so bad if there was only Map? Then the argument could be a plain function instead of an interface. Right now the CL uses it as a hint to allocate a map, in which case it really doesn't matter, but if a different implementation used it to allocate a slice and then blindly filled in increasing indexes, that would be a problem.
I'm also a little confused about the avoidance of LabelSet. Should we instead be looking at an alternate LabelSet constructor, like LabelSetFromFunc (or a better name)? Then the result could be passed to both Do and WithLabels.
Any comments about trying to use an alternate LabelSet constructor instead of all new API?
I have not gotten around to profiling or testing that yet. Putting better naming aside a possible approach is:
type LabelSet struct {
mapper LabelMapper
}
func LabelsFromMapper(lm LabelMapper) LabelSet {
return LabelSet{mapper: lm}
}
func Labels(args ...string) LabelSet {
return LabelSet{mapper: listMapper(args)}
}
type listMapper []string
func (l listMapper) Len() int { return len(l) }
func (l listMapper) Map(f func(key, value string)) {
for i := 0; i+1 < len(l); i += 2 {
f(l[i], l[i+1])
}
}
to keep LabelSet
struct backwards compatible. The inner workings of WithLabels
can then be replaced with that of WithLabelsFromMapper
from https://golang.org/cl/188499 using the mapper
field of LabelSet
.
ping @martisch - any profiling or testing of this alternate approach?
The approach of adding a new helper LabelsFromAdder
(name to be decided), using WithLabels with a new internal implementation and changing LabelSet struct seems equally performant.
In addition I removed the mapper closure in the new approach as that caused an allocation which would regress the List case in the number of allocations but has the downside of exposing the internal map[string]string structure. As that is not ideal I would need to first investigate if we can stack allocate and keep that implementation detail hidden with a closure again if that is important.
code:
type LabelSet struct {
adder LabelAdder
}
type LabelAdder interface {
AddLabels(map[string]string) // should have a better name
}
type listLabels []string
func (l listLabels) AddLabels(m map[string]string) {
for i := 0; i+1 < len(l); i += 2 {
m[l[i]] = l[i+1]
}
}
func WithLabels(ctx context.Context, labelset LabelSet) context.Context {
parentLabels := labelValue(ctx)
childLabels := make(labelMap)
for k, v := range parentLabels {
childLabels[k] = v
}
labelset.adder.AddLabels(childLabels)
return context.WithValue(ctx, labelContextKey{}, &childLabels)
}
func LabelsFromAdder(labels LabelAdder) LabelSet {
return LabelSet{adder: labels}
}
func Labels(args ...string) LabelSet {
return LabelSet{adder: listLabels(args)}
}
name time/op
ContextLabels/WithLabels/Labels/List 297ns ± 2%
ContextLabels/WithLabels/Labels/Map 366ns ± 3%
ContextLabels/WithLabels/LabelsFromAdder/List 208ns ± 2%
ContextLabels/WithLabels/LabelsFromAdder/Map 264ns ± 2%
name alloc/op
ContextLabels/WithLabels/Labels/List 520B ± 0%
ContextLabels/WithLabels/Labels/Map 520B ± 0%
ContextLabels/WithLabels/LabelsFromAdder/List 392B ± 0%
ContextLabels/WithLabels/LabelsFromAdder/Map 392B ± 0%
name allocs/op
ContextLabels/WithLabels/Labels/List 6.00 ± 0%
ContextLabels/WithLabels/Labels/Map 6.00 ± 0%
ContextLabels/WithLabels/LabelsFromAdder/List 4.00 ± 0%
ContextLabels/WithLabels/LabelsFromAdder/Map 4.00 ± 0%
Updated cl/188499 with new code.
@martisch, thanks for confirming that we can use an alternate LabelSet constructor instead of having to change other parts of the API.
It would still be good to find a way to avoid exposing the map[string]string. In the long run I expect that internal detail might change too. Unless it is OK for the LabelSet constructor to be passed a dummy map and copy those values out.
It would still be good to find a way to avoid exposing the map[string]string. I agree.
Since we are in freeze for go1.14 I have 6months to spend some time figuring out an interface (and potential generic compiler optimization if needed) that does not cause an additional allocation and just assumes a string type for key and value but no other implementation details.
Marking this on hold until @martisch's update. @martisch, please remove the hold label once you have an update. No hurry. Thanks.
The OpenTelemetry Go SDK has taken a position on this topic: https://github.com/open-telemetry/opentelemetry-go/pull/651
The Metrics API needs a LabelSet type that is both inexpensive, as stated in this issue, but also supports an Equal
method, to support combining metric events by distinct label set. We landed on an interface{}
containing a variable-size array of key-value structs.
It would be great if the runtime/pprof
labels were somehow able to re-use structures computed in this way for OpenTelemetry. OpenTelemetry's trace.WithSpan()
function would then be able to automatically enter profiler labels from its span attributes, and cheaply, because it would simply enter a structure it has computed for other reasons.
In the above library, @martisch, we take the position that a stable sort, followed by de-duplication, is cheaper than using a map in this situation.
Referring to @rsc's API proposal above (https://github.com/golang/go/issues/33701#issuecomment-531009349), I believe that it's slightly better to support an interface like in the OTel label
package above, which is:
type LabelSet interface {
Len() int
Get(int) (KeyValue, bool)
}
This my preference because often to pass a func()
argument implies an allocation, whereas the simpler Get()
API does not.
Thanks for the suggestions.
I initially chose the Map approach since some census frameworks use maps for census labels. Since maps have no native operations to get the i th element and no stable iteration order the Get approach for maps needs them to extract all keys+values and buffer them. Maybe that easy enough to do with a wrapper (but it will cause allocation(s)). If that is as good as the map approach this might be workable and as good in performance if we teach the compiler to potentially optimize extracting all keys and values in a loop.
Ideally (unless there are very good reasons) we need to keep LabelSet a struct to not break backwards compatibility. We can add a new field that is an interface.
P.S. I dont see a proposal of Len and Get in the comments (but tried it in an initial prototype).
I was referring to Len()
and Get()
in the OTel API linked in the comment above (https://github.com/golang/go/issues/33701#issuecomment-618606266).
If the type has to be a struct, I'd make it a struct { LabelInterface }
so that we can support more than one input type that implements LabelInterface
.
runtime/pprof.Labels
is used in conjunction withruntime/pprof.WithLabels
to set pprof labels in a context for performance profiling. https://github.com/golang/go/blob/c485506b0aae298652448e80fca35036bfa755ac/src/runtime/pprof/label.go#L59Adding information for fine grained on demand profiling of already running binaries should idealy be very efficient so it can always stay enabled with minimal overhead. The current API could be made more efficient by requiring fewer heap allocations. Pprof information sourced from contexts added by census frameworks is used in large go deployments on every RPC request and thereby small performance gains add up to a larger resource saving across many servers.
The current
runtime/pprof
API requires census frameworks such as OpenCensus to first convert their internal representation of key and value tag pairs (in a slice or map) to a slice of strings for input toruntime/pprof.Labels
.https://github.com/census-instrumentation/opencensus-go/blob/df6e2001952312404b06f5f6f03fcb4aec1648e5/tag/profile_19.go#L24
This requires at least one heap allocation for a variable amount of labels. Then internaly the
Labels
functions constructs aLabelSet
data structure which requires another allocation (the case where this uses more than one allocation will be improved with cl/181517 ). All in all this makes two heap allocations per context creation with pprof labels which can potentially be avoided.I propose to extend
runtime/pprof
to have an API that takes e.g. a mapping/iteration interface such that census frameworks can implement that interface on their internal tag representations (e.g. maps and slices with custom types) andruntime/pprof
can then source the labels to be set in a newruntime/pprof.WithLabels*
function without first requiring conversion between multiple internal and external data structures.cl/188499 is a quick prototype as an example how this could look like. Different other ways of making an interface that can be used are possible to reduce allocations. Note that the LabelSet struct cant be changed to an interface itself (which seems the cleaner approach) due being not API backwards compatible.
/cc @aclements @randall77 @matloob