golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.03k stars 17.54k forks source link

spec: clarify that conversions to slices don't guarantee slice capacity? #24163

Open mattbostock opened 6 years ago

mattbostock commented 6 years ago

What did you do?

Converted a string to a slice of runes then back to a string, and referenced non-existent elements on the slice of runes:

https://play.golang.org/p/MmJNLu6h6FT

What did you expect to see?

A runtime panic with a 'slice bounds out of range' error.

What did you see instead?

"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"

System details

go version go1.10 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/matt/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/matt/go"
GORACE=""
GOROOT="/opt/boxen/homebrew/Cellar/go/1.10/libexec"
GOTMPDIR=""
GOTOOLDIR="/opt/boxen/homebrew/Cellar/go/1.10/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/k4/_18vs9g119b0q2_p950t0jl00000gn/T/go-build038036646=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.10 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.10
uname -v: Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64
ProductName:    Mac OS X
ProductVersion: 10.13.3
BuildVersion:   17D47
lldb --version: lldb-900.0.64
  Swift-4.0
bradfitz commented 6 years ago

Interesting, thanks!

Here we can see that the cap is 32: https://play.golang.org/p/PDAI5olA4YG

But uncomment the following line and the cap changes to 0: https://play.golang.org/p/BA2ivc0sTY6

Definitely something messed up in the compiler.

/cc @randall77 @mdempsky

bradfitz commented 6 years ago

Actually, does the language guarantee the size of the cap on string->[]rune conversions?

https://golang.org/ref/spec#Conversions_to_and_from_a_string_type says only:

Converting a value of a string type to a slice of runes type yields a slice containing the individual Unicode code points of the string

But doesn't say anything about the slice's capacity.

/cc @griesemer

mattbostock commented 6 years ago

Here's another example using len() that might help clarify how confusing this could be: https://play.golang.org/p/YLwjpbY7_nX

mdempsky commented 6 years ago

@mattbostock Thanks for the examples. I think this example summarizes/demonstrates the root issue you're noticing: https://play.golang.org/p/Er5s4t9rO7R

(Also, note that the upper bound for slice operations is cap(s), not len(s); i.e., it's possible to grow a slice, as long as there's capacity.)

Like @bradfitz points out, I think the real question is whether string->[]byte or string->[]rune conversions guarantee a particular slice capacity. It seems the spec doesn't guarantee that.

bradfitz commented 6 years ago

So seems everything is working fine, but the language spec could possibly be more specific here.

Moving to NeedsDecision to see whether the language gets changed.

/cc @ianlancetaylor too.

ianlancetaylor commented 6 years ago

I think that for these conversions the result should be a slice with cap == len. I see no benefit to leaving it unspecified.

mdempsky commented 6 years ago

I see no benefit to leaving it unspecified.

I think there's a small performance benefit. Suppose something like:

var s string = f()
return append([]byte(s), "suffix"...)

For the conversion, the runtime will need to allocate memory. Also, it will pad the allocation up to the next allocation bucket size. If we expose this capacity to the user via the resulting slice's capacity, then the append might be able to fit into the existing space without requiring a second allocation.

griesemer commented 6 years ago

As bad as it sounds, there may be existing code that relies on this feature...

Perhaps the spec just needs to say that it doesn't say anything about the capacity in these cases.

dominikh commented 6 years ago

Related: https://github.com/golang/go/issues/14232 and https://github.com/golang/go/issues/14235

bradfitz commented 6 years ago

As bad as it sounds, there may be existing code that relies on this feature...

Considering that escape analysis changes somewhat regularly, I kinda doubt anybody's relying on this. We've never heard anything about it.

randall77 commented 6 years ago

For conversions from string to both []byte and []rune, we allocate extra capacity. This happens for both stack-allocated (round up to the 32-byte stack buffer) and heap-allocated (round up to the next size class) results:

func main() {
    foo := "foo"
    bar := []rune(foo)
    fmt.Printf("%d %d\n", len(bar), cap(bar))

    baz := []rune(foo)
    sink = baz
    fmt.Printf("%d %d\n", len(baz), cap(baz))
}

var sink []rune

Prints:

3 32
3 4

So escape analysis doesn't really figure into this discussion, it just affects the constants.

IMO when Go does an allocation and the user didn't specify a capacity, the runtime should be free to allocate capacity somewhat beyond the requested size. Calling make gives you exactly the capacity requested*. Calling append or doing a cast need not.

*Although you could argue that only the 3-arg version of make requests an explicit capacity, the 2-arg version could allow cap>len.

mdempsky commented 6 years ago

So escape analysis doesn't really figure into this discussion, it just affects the constants.

I think @bradfitz's point was that because escape analysis changes, the constants particular user code sees sometimes changes too. Thus it's arguably safe if we want to intentionally change the constants.

*Although you could argue that only the 3-arg version of make requests an explicit capacity, the 2-arg version could allow cap>len.

I'd argue even for the 3-arg version, that the explicit cap argument should just be a lower bound on what the runtime returns. But unfortunately these seems out of scope for Go 1.

randall77 commented 6 years ago

@mdempsky Wow, you're even more of a rebel than I am.

josharian commented 6 years ago

Although you could argue that only the 3-arg version of make requests an explicit capacity, the 2-arg version could allow cap>len.

To be clear, the spec does currently appear to forbid this.

I'd argue even for the 3-arg version, that the explicit cap argument should just be a lower bound on what the runtime returns. But unfortunately these seems out of scope for Go 1.

There's an analogy with making maps, where the len arg is just a hint. On the other hand, we definitely wouldn't want the len arg to buffered channels to be just a hint, since they are used to strictly control concurrency.

Another option is to add a "round up" function to package runtime that accepts a size (obtained presumably by unsafe.Sizeof) and a quantity and returns a recommended quantity. Then in cases in which it really matters, people would have the tools to do it--at the cost of using both package unsafe and package runtime. Combine with generics and you could even build append as a library function. :)

(Note that if one wanted to go really nuts and build this yourself without package runtime support, you could, by writing an init function that makes byte slices of various sizes and calls append on them and records the cap you get back, and inferring malloc slab sizes from them.)

mdempsky commented 6 years ago

I think the discussions about make are interesting, but increasingly off topic. I've opened #24204 to discuss them further. :)