golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.73k stars 17.63k forks source link

bytes: TrimXXX makes it easy to unintentionally overwrite slice #43341

Open shmsr opened 3 years ago

shmsr commented 3 years ago

What version of Go are you using (go version)?

$ go version

go version go1.15.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/subhamsarkar/Library/Caches/go-build"
GOENV="/Users/subhamsarkar/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/subhamsarkar/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/subhamsarkar/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.15.5/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.15.5/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/n2/w00w99g93cj26xl42msl6lc80000gp/T/go-build468451499=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Trim and friends (TrimXXX) in the bytes package seem to unintentionally overwrite slices. Well, this could be working as designed but do refer:

  1. https://github.com/golang/go/issues/21149
  2. https://github.com/golang/go/issues/30169

So, based on CLs that fixed the issues listed above, I think it'd be appropriate to match the capacity to the length so that it could be avoided and match the behaviour of other functions.

package main

import (
    "bytes"
    "fmt"
)

// func FixTrimSuffix(s, suffix []byte) []byte {
//  if bytes.HasSuffix(s, suffix) {
//      // Match capacity and length. Fixed bytes.TrimSuffix
//      return s[: len(s)-len(suffix) : len(s)-len(suffix)]
//  }
//  return s
// }

func main() {
    b := []byte{'h', 'e', 'l', 'l', 'o'}
    t := []byte{'l', 'o'}

    fmt.Printf("b: %s\nt: %s\n", b, t)

    fmt.Printf("\n~ TrimSuffix t from b\n")
    bytesTrim := bytes.TrimSuffix(b, t)
    fmt.Printf("b: %s\nbytesTrim: %s\n", b, bytesTrim)

    fmt.Printf("\n~ Append xy to bytesTrim\n")
    bytesTrim = append(bytesTrim, 'x', 'y')
    fmt.Printf("b: %s\nbytesTrim: %s\n", b, bytesTrim)
}

Link to the playground: https://play.golang.org/p/yGJxI_Jnkk6

What did you expect to see?

b: hello
t: lo

~ TrimSuffix t from b
b: hello
bytesTrim: hel

~ Append xy to bytesTrim
b: hello
bytesTrim: helxy

What did you see instead?

b: hello
t: lo

~ TrimSuffix t from b
b: hello
bytesTrim: hel

~ Append xy to bytesTrim
b: helxy
bytesTrim: helxy
shmsr commented 3 years ago

If this issue needs to be fixed then I'd like to send a CL for this if it's fine.

seankhliao commented 3 years ago

This is how slices work: they share a backing array. See https://tour.golang.org/moretypes/8

shmsr commented 3 years ago

@seankhliao I know how slices work. Look at CL for https://github.com/golang/go/issues/30169!

seankhliao commented 3 years ago

I see, after reading the linked issues, :+1:

ianlancetaylor commented 3 years ago

It would help to see the kind of analysis that was done in https://github.com/golang/go/issues/21149#issuecomment-317840266. My intuition is that TrimSuffix in particular is a case where people might want to write things like b = append(bytes.TrimSuffix(b, ".go"), ".o"...) in which case forcing a new allocation would be undesirable.

shmsr commented 3 years ago

Yeah, right. Okay, I'll do some research on this!

mvdan commented 3 years ago

Just to give an extra data point - I've used Trim funcs many times with the knowledge that they do not allocate. Changing that would be surprising to me, and would make perfectly valid code suddenly allocate more. That could very well mean regressions if the call is in performance-sensitive code.

zigo101 commented 3 years ago

If the syntax s[::] is supported, this can be easily written as

bytesTrim = append(bytesTrim[::], 'x', 'y')

b = append(bytes.TrimSuffix(b, ".go")[::], ".o"...)

to always make an allocation.

shmsr commented 3 years ago

Sorry for keep you guys waiting. So I'm skimmed through a lot of projects which were using popular packages (mostly vendored) and I think it should be safe to do the change that I was suggesting earlier. Although there's some disagreement and that's totally valid. So here I'm attaching some links where you can see the usage of bytes's Trim and friends.

In my opinion, the performance aspect of limiting the capacity might affect a small number of projects but most of the projects shouldn't see any difference. Also, it'd match the behaviour done in #21149 #30169

Here are some links:

vendor: (github.com/go-git/go-git/v5/plumbing/object/commit.go) https://github.com/GoogleContainerTools/kaniko/blob/master/vendor/github.com/go-git/go-git/v5/plumbing/object/commit.go#L198

vendor: (github.com/go-git/go-git/v5/plumbing/object/commit.go) https://github.com/GoogleContainerTools/kaniko/blob/ece215c18113020f9151fb25e69fc4ecc157c395/vendor/github.com/go-git/go-git/v5/plumbing/object/commit.go#L207

vendor: (github.com/moby/buildkit/frontend/dockerfile/parser/parser.go) https://github.com/GoogleContainerTools/kaniko/blob/master/vendor/github.com/moby/buildkit/frontend/dockerfile/parser/parser.go#L228

vendor: (golang.org/x/crypto/openpgp/armor/armor.go) https://github.com/GoogleContainerTools/skaffold/blob/master/hack/tools/vendor/golang.org/x/crypto/openpgp/armor/armor.go#L175

vendor: (golang.org/x/crypto/openpgp/armor/armor.go) https://github.com/GoogleContainerTools/skaffold/blob/master/hack/tools/vendor/golang.org/x/crypto/openpgp/armor/armor.go#L199

https://github.com/GoogleContainerTools/skaffold/blob/master/pkg/skaffold/build/tag/git_commit.go#L166

https://github.com/Jeffail/benthos/blob/master/lib/processor/text.go#L130

vendor: (golang.org/x/crypto/ssh/keys.go) https://github.com/PursuanceProject/pursuance/blob/develop/vendor/golang.org/x/crypto/ssh/keys.go#L67

vendor: (golang.org/x/crypto/ssh/keys.go) https://github.com/PursuanceProject/pursuance/blob/develop/vendor/golang.org/x/crypto/ssh/keys.go#L85

vendor: (github.com/russross/blackfriday/v2/block.go) https://github.com/docker/cli/blob/master/vendor/github.com/russross/blackfriday/v2/block.go#L310

https://github.com/jcmvbkbc/gcc-xtensa/blob/call0-4.8.2/libgo/go/encoding/pem/pem.go#L48

Do let me know if you need more examples.

dsnet commented 3 years ago

I'm not sure if isolated examples are helpful (it's fairly easy to find examples that support either semantic). What would be needed is a thorough analysis that answers something like:

Currently, based on the 👍 on comments above, it seems that people generally prefer the current semantic.