Open mattbostock opened 6 years ago
Interesting, thanks!
Here we can see that the cap is 32: https://play.golang.org/p/PDAI5olA4YG
But uncomment the following line and the cap changes to 0: https://play.golang.org/p/BA2ivc0sTY6
Definitely something messed up in the compiler.
/cc @randall77 @mdempsky
Actually, does the language guarantee the size of the cap on string->[]rune conversions?
https://golang.org/ref/spec#Conversions_to_and_from_a_string_type says only:
Converting a value of a string type to a slice of runes type yields a slice containing the individual Unicode code points of the string
But doesn't say anything about the slice's capacity.
/cc @griesemer
Here's another example using len()
that might help clarify how confusing this could be:
https://play.golang.org/p/YLwjpbY7_nX
@mattbostock Thanks for the examples. I think this example summarizes/demonstrates the root issue you're noticing: https://play.golang.org/p/Er5s4t9rO7R
(Also, note that the upper bound for slice operations is cap(s)
, not len(s)
; i.e., it's possible to grow a slice, as long as there's capacity.)
Like @bradfitz points out, I think the real question is whether string->[]byte or string->[]rune conversions guarantee a particular slice capacity. It seems the spec doesn't guarantee that.
So seems everything is working fine, but the language spec could possibly be more specific here.
Moving to NeedsDecision to see whether the language gets changed.
/cc @ianlancetaylor too.
I think that for these conversions the result should be a slice with cap == len. I see no benefit to leaving it unspecified.
I see no benefit to leaving it unspecified.
I think there's a small performance benefit. Suppose something like:
var s string = f()
return append([]byte(s), "suffix"...)
For the conversion, the runtime will need to allocate memory. Also, it will pad the allocation up to the next allocation bucket size. If we expose this capacity to the user via the resulting slice's capacity, then the append might be able to fit into the existing space without requiring a second allocation.
As bad as it sounds, there may be existing code that relies on this feature...
Perhaps the spec just needs to say that it doesn't say anything about the capacity in these cases.
As bad as it sounds, there may be existing code that relies on this feature...
Considering that escape analysis changes somewhat regularly, I kinda doubt anybody's relying on this. We've never heard anything about it.
For conversions from string
to both []byte
and []rune
, we allocate extra capacity. This happens for both stack-allocated (round up to the 32-byte stack buffer) and heap-allocated (round up to the next size class) results:
func main() {
foo := "foo"
bar := []rune(foo)
fmt.Printf("%d %d\n", len(bar), cap(bar))
baz := []rune(foo)
sink = baz
fmt.Printf("%d %d\n", len(baz), cap(baz))
}
var sink []rune
Prints:
3 32
3 4
So escape analysis doesn't really figure into this discussion, it just affects the constants.
IMO when Go does an allocation and the user didn't specify a capacity, the runtime should be free to allocate capacity somewhat beyond the requested size. Calling make
gives you exactly the capacity requested*. Calling append
or doing a cast need not.
*Although you could argue that only the 3-arg version of make requests an explicit capacity, the 2-arg version could allow cap>len.
So escape analysis doesn't really figure into this discussion, it just affects the constants.
I think @bradfitz's point was that because escape analysis changes, the constants particular user code sees sometimes changes too. Thus it's arguably safe if we want to intentionally change the constants.
*Although you could argue that only the 3-arg version of make requests an explicit capacity, the 2-arg version could allow cap>len.
I'd argue even for the 3-arg version, that the explicit cap argument should just be a lower bound on what the runtime returns. But unfortunately these seems out of scope for Go 1.
@mdempsky Wow, you're even more of a rebel than I am.
Although you could argue that only the 3-arg version of make requests an explicit capacity, the 2-arg version could allow cap>len.
To be clear, the spec does currently appear to forbid this.
I'd argue even for the 3-arg version, that the explicit cap argument should just be a lower bound on what the runtime returns. But unfortunately these seems out of scope for Go 1.
There's an analogy with making maps, where the len arg is just a hint. On the other hand, we definitely wouldn't want the len arg to buffered channels to be just a hint, since they are used to strictly control concurrency.
Another option is to add a "round up" function to package runtime that accepts a size (obtained presumably by unsafe.Sizeof) and a quantity and returns a recommended quantity. Then in cases in which it really matters, people would have the tools to do it--at the cost of using both package unsafe and package runtime. Combine with generics and you could even build append as a library function. :)
(Note that if one wanted to go really nuts and build this yourself without package runtime support, you could, by writing an init function that makes byte slices of various sizes and calls append on them and records the cap you get back, and inferring malloc slab sizes from them.)
I think the discussions about make
are interesting, but increasingly off topic. I've opened #24204 to discuss them further. :)
What did you do?
Converted a string to a slice of runes then back to a string, and referenced non-existent elements on the slice of runes:
https://play.golang.org/p/MmJNLu6h6FT
What did you expect to see?
A runtime panic with a 'slice bounds out of range' error.
What did you see instead?
System details