golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.06k stars 17.68k forks source link

strings: copying but not using a Builder after first use leads to runtime crash (generates heap -> stack pointer) #47276

Open momotaro98 opened 3 years ago

momotaro98 commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.16.6

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

What did you do?

https://play.golang.org/p/RRI3-srzVrR

package main

import (
    "fmt"
    "strings"
)

func f(myList []string) error {
    const (
        Threshold = 1000
    )

    var (
        builderList = []strings.Builder{}
    )
    var (
        bullder = strings.Builder{}
    )
    var (
        count uint64 = 0
    )
    for _, _ = range myList {
        if count > Threshold {
            builderList = append(builderList, bullder)
            bullder = strings.Builder{}
            count = 0
        }
        bullder.WriteString("123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890")
        count++
    }
    return nil
}

func main() {
    myList := make([]string, 1000000)
    err := f(myList)
    if err != nil {
        panic(err)
    }
    fmt.Println("success")
}

What did you expect to see?

I don't see any error message

What did you see instead?


--
runtime: pointer 0xc035a7bc78 to unused region of span span.base()=0xc0001de000 span.limit=0xc0001dff80 span.state=1
  | 2021-07-19T12:43:00.012+09:00 | runtime: found in object at *(0xc0026a6000+0x0)
  | 2021-07-19T12:43:00.012+09:00 | object=0xc0026a6000 s.base()=0xc0026a6000 s.limit=0xc0026a8000 s.spanclass=36 s.elemsize=256 s.state=mSpanInUse
  | 2021-07-19T12:43:00.012+09:00 | *(object+0) = 0xc035a7bc78 <==
  | 2021-07-19T12:43:00.012+09:00 | *(object+8) = 0xc002700000
  | 2021-07-19T12:43:00.012+09:00 | *(object+16) = 0x113a6c
  | 2021-07-19T12:43:00.012+09:00 | *(object+24) = 0x128000
  | 2021-07-19T12:43:00.012+09:00 | *(object+32) = 0xc035a7bc78
  | 2021-07-19T12:43:00.012+09:00 | *(object+40) = 0xc0017ce000
  | 2021-07-19T12:43:00.012+09:00 | *(object+48) = 0x1100aa
  | 2021-07-19T12:43:00.012+09:00 | *(object+56) = 0x128000
  | 2021-07-19T12:43:00.012+09:00 | *(object+64) = 0xc00008fc78
  | 2021-07-19T12:43:00.012+09:00 | *(object+72) = 0xc002300000
  | 2021-07-19T12:43:00.012+09:00 | *(object+80) = 0x115fcc
  | 2021-07-19T12:43:00.012+09:00 | *(object+88) = 0x128000
  | 2021-07-19T12:43:00.012+09:00 | *(object+96) = 0xc00008fc78
  | 2021-07-19T12:43:00.012+09:00 | *(object+104) = 0xc0018f6000
  | 2021-07-19T12:43:00.012+09:00 | *(object+112) = 0x115de0
  | 2021-07-19T12:43:00.012+09:00 | *(object+120) = 0x128000
  | 2021-07-19T12:43:00.012+09:00 | *(object+128) = 0xc00008fc78
  | 2021-07-19T12:43:00.012+09:00 | *(object+136) = 0xc002992000
  | 2021-07-19T12:43:00.013+09:00 | *(object+144) = 0x114359
  | 2021-07-19T12:43:00.013+09:00 | *(object+152) = 0x128000
  | 2021-07-19T12:43:00.013+09:00 | *(object+160) = 0xc00008fc78
  | 2021-07-19T12:43:00.013+09:00 | *(object+168) = 0xc001a1e000
  | 2021-07-19T12:43:00.013+09:00 | *(object+176) = 0x11aa0e
  | 2021-07-19T12:43:00.013+09:00 | *(object+184) = 0x128000
  | 2021-07-19T12:43:00.013+09:00 | *(object+192) = 0xc00008fc78
  | 2021-07-19T12:43:00.013+09:00 | *(object+200) = 0xc001f80000
  | 2021-07-19T12:43:00.013+09:00 | *(object+208) = 0x113b9b
  | 2021-07-19T12:43:00.013+09:00 | *(object+216) = 0x128000
  | 2021-07-19T12:43:00.013+09:00 | *(object+224) = 0x0
  | 2021-07-19T12:43:00.013+09:00 | *(object+232) = 0x0
  | 2021-07-19T12:43:00.013+09:00 | *(object+240) = 0x0
  | 2021-07-19T12:43:00.013+09:00 | *(object+248) = 0x0
  | 2021-07-19T12:43:00.013+09:00 | fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
  | 2021-07-19T12:43:00.016+09:00 | runtime stack:
  | 2021-07-19T12:43:00.016+09:00 | runtime.throw(0x98583e, 0x3e)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc000049e28 sp=0xc000049df8 pc=0x435db2
  | 2021-07-19T12:43:00.016+09:00 | runtime.badPointer(0x7f0421f06050, 0xc035a7bc78, 0xc0026a6000, 0x0)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mbitmap.go:351 +0x235 fp=0xc000049e70 sp=0xc000049e28 pc=0x415335
  | 2021-07-19T12:43:00.016+09:00 | runtime.findObject(0xc035a7bc78, 0xc0026a6000, 0x0, 0x0, 0x0, 0x0)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mbitmap.go:387 +0x9b fp=0xc000049ea8 sp=0xc000049e70 pc=0x4153fb
  | 2021-07-19T12:43:00.016+09:00 | runtime.scanobject(0xc0026a6000, 0xc000027698)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mgcmark.go:1286 +0x24a fp=0xc000049f38 sp=0xc000049ea8 pc=0x42138a
  | 2021-07-19T12:43:00.016+09:00 | runtime.gcDrain(0xc000027698, 0x3)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mgcmark.go:1048 +0x22b fp=0xc000049f90 sp=0xc000049f38 pc=0x420b0b
  | 2021-07-19T12:43:00.016+09:00 | runtime.gcBgMarkWorker.func2()
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mgc.go:1980 +0x88 fp=0xc000049fd0 sp=0xc000049f90 pc=0x461f28
  | 2021-07-19T12:43:00.016+09:00 | runtime.systemstack(0x0)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/asm_amd64.s:379 +0x66 fp=0xc000049fd8 sp=0xc000049fd0 pc=0x4693e6
  | 2021-07-19T12:43:00.016+09:00 | runtime.mstart()
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/proc.go:1246 fp=0xc000049fe0 sp=0xc000049fd8 pc=0x43b080
  | 2021-07-19T12:43:00.016+09:00 | goroutine 34 [GC worker (idle)]:
  | 2021-07-19T12:43:00.016+09:00 | runtime.systemstack_switch()
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/asm_amd64.s:339 fp=0xc000030f60 sp=0xc000030f58 pc=0x469360
  | 2021-07-19T12:43:00.016+09:00 | runtime.gcBgMarkWorker()
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mgc.go:1967 +0x1c7 fp=0xc000030fe0 sp=0xc000030f60 pc=0x41d627
  | 2021-07-19T12:43:00.016+09:00 | runtime.goexit()
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc000030fe8 sp=0xc000030fe0 pc=0x46b061
  | 2021-07-19T12:43:00.016+09:00 | created by runtime.gcBgMarkStartWorkers
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mgc.go:1835 +0x37
  | 2021-07-19T12:43:00.016+09:00 | goroutine 1 [runnable]:
  | 2021-07-19T12:43:00.016+09:00 | runtime.Gosched(...)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/proc.go:292
  | 2021-07-19T12:43:00.016+09:00 | runtime.gcAssistAlloc(0xc000000180)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/mgcmark.go:469 +0x1bc fp=0xc00008faa0 sp=0xc00008fa40 pc=0x41f55c
  | 2021-07-19T12:43:00.016+09:00 | runtime.mallocgc(0xbc000, 0x0, 0xc0032c7000, 0x0)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/malloc.go:961 +0x996 fp=0xc00008fb28 sp=0xc00008faa0 pc=0x40ddb6
  | 2021-07-19T12:43:00.016+09:00 | runtime.growslice(0x8b9ec0, 0xc003100000, 0x95dcc, 0x96000, 0x96199, 0xc0032cd000, 0x3cd, 0x400)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/runtime/slice.go:224 +0x154 fp=0xc00008fb90 sp=0xc00008fb28 pc=0x44d6d4
  | 2021-07-19T12:43:00.016+09:00 | strings.(*Builder).WriteString(...)
  | 2021-07-19T12:43:00.016+09:00 | /usr/local/go/src/strings/builder.go:123
  | 2021-07-19T12:43:00.016+09:00 | git.rarejob.com/rarejob-platform/event-search-polling-batch/src/repositories/eselasticsearch.(*elasticSearch).changeTutor(0xc00000e0c0, 0xc034e46000, 0x9727, 0x9727, 0x9798ab, 0x17, 0x0, 0x0)
  | 2021-07-19T12:43:00.016+09:00 | /builds/rarejob-platform/event-search-polling-batch/src/repositories/eselasticsearch/tutors.go:71 +0x392 fp=0xc00008fcc8 sp=0xc00008fb90 pc=0x8666b2
  | 2021-07-19T12:43:00.016+09:00 | git.rarejob.com/rarejob-platform/event-search-polling-batch/src/repositories/eselasticsearch.(*elasticSearch).ChangeAllTutor(0xc00000e0c0, 0xc034e46000, 0x9727, 0x9727, 0x0, 0x0)
  | 2021-07-19T12:43:00.016+09:00 | /builds/rarejob-platform/event-search-polling-batch/src/repositories/eselasticsearch/tutors.go:19 +0x5e fp=0xc00008fd18 sp=0xc00008fcc8 pc=0x86623e
  | 2021-07-19T12:43:00.016+09:00 | git.rarejob.com/rarejob-platform/event-search-polling-batch/src/services.(*service).SetEsElasticsearch(0xc000024360, 0xa0aef0, 0xc00001a040, 0xc00006ba00, 0x0, 0x0)
  | 2021-07-19T12:43:00.016+09:00 | /builds/rarejob-platform/event-search-polling-batch/src/services/es_elasticsearch.go:40 +0x439 fp=0xc00008fe18 sp=0xc00008fd18 pc=0x871739
 ```
mpx commented 3 years ago

The example:

The strack trace was generated from different code (it references git.rarejob.com/rarejob-platform/event-search-polling-batch/ and Elasticsearch). Does the example code fail for you?

I'd recommend running your code with go build -race to try to detect races or unsafe pointer use. See the Go1.14 release notes for more details on checkptr, also go doc unsafe.Pointer.

ALTree commented 3 years ago

I can reproduce the crash with the snippet above on 1.16.5 if I set an higher GOMAXPROCS (like 4 or 8).

I also think the crash is caused by the fact that you're copying around non-empty strings.Builders, which the doc explicitly says it's not allowed. It's also true that the failure mode is a little obscure, but I'm not sure there's an easy way to report a better error.

momotaro98 commented 3 years ago

I'm sorry the stack trace was generated from different code. I tried to show the logic of the code.

Thank you for your kind feedback.

mpx commented 3 years ago

@ALTree - can you please attach the panic to this issue? What platform?

I haven't been able to reproduce on Linux/amd64. The original report is for Darwin.

@momotaro98 - it looks like this might be a real problem since someone has been able to reproduce a crash with the example code.

This issue probably should be reopened.

mpx commented 3 years ago

Cc @mknyszek

ALTree commented 3 years ago

it looks like this mighty be a real problem since someone has been able to reproduce a crash with the example code.

But the example code copies around strings.Builders values, which I think it's not allowed.

mpx commented 3 years ago

The runtime still shouldn't throw and crash. That indicates either the runtime, compiler, or strings.Builder might need to be fixed.

@ALTree are you able to provide a failed stack trace from the example code? Which platform (or go env)?

ALTree commented 3 years ago
$ go version
go version go1.16.5 windows/amd64

$ GOMAXPROCS=8 go run test.go
runtime: marked free object in span 0x207387e40b0, elemsize=192 freeindex=30 (bad use of unsafe.Pointer? try -d=checkptr)
0xc001f30000 alloc unmarked
0xc001f300c0 alloc unmarked
0xc001f30180 alloc unmarked
0xc001f30240 alloc unmarked
0xc001f30300 alloc unmarked
0xc001f303c0 alloc unmarked
0xc001f30480 alloc unmarked
0xc001f30540 alloc unmarked
0xc001f30600 alloc unmarked
0xc001f306c0 alloc unmarked
0xc001f30780 alloc unmarked
0xc001f30840 alloc unmarked
0xc001f30900 alloc unmarked
0xc001f309c0 alloc unmarked
0xc001f30a80 alloc unmarked
0xc001f30b40 alloc unmarked
0xc001f30c00 alloc unmarked
0xc001f30cc0 alloc unmarked
0xc001f30d80 alloc unmarked
0xc001f30e40 alloc unmarked
0xc001f30f00 alloc unmarked
0xc001f30fc0 alloc unmarked
0xc001f31080 alloc unmarked
0xc001f31140 alloc unmarked
0xc001f31200 alloc unmarked
0xc001f312c0 alloc unmarked
0xc001f31380 alloc unmarked
0xc001f31440 alloc unmarked
0xc001f31500 alloc unmarked
0xc001f315c0 alloc marked
0xc001f31680 free  unmarked
0xc001f31740 free  unmarked
0xc001f31800 free  unmarked
0xc001f318c0 free  unmarked
0xc001f31980 free  unmarked
0xc001f31a40 free  unmarked
0xc001f31b00 free  unmarked
0xc001f31bc0 free  unmarked
0xc001f31c80 free  unmarked
0xc001f31d40 free  unmarked
0xc001f31e00 free  unmarked
0xc001f31ec0 free  marked   zombie
000000c001f31ec0:  0000000000000000  0000000000000000
000000c001f31ed0:  0000000000000000  0000000000000000
000000c001f31ee0:  0000000000000000  0000000000000000
000000c001f31ef0:  0000000000000000  0000000000000000
000000c001f31f00:  0000000000000000  0000000000000000
000000c001f31f10:  0000000000000000  0000000000000000
000000c001f31f20:  0000000000000000  0000000000000000
000000c001f31f30:  0000000000000000  0000000000000000
000000c001f31f40:  0000000000000000  0000000000000000
000000c001f31f50:  0000000000000000  0000000000000000
000000c001f31f60:  0000000000000000  0000000000000000
000000c001f31f70:  0000000000000000  0000000000000000
fatal error: found pointer to free object

runtime stack:
runtime.throw(0x1341d26, 0x1c)
        XXXX/other/go/src/runtime/panic.go:1117 +0x79
runtime.(*mspan).reportZombies(0x207387e40b0)
        XXXX/other/go/src/runtime/mgcsweep.go:614 +0x385
runtime.(*mspan).sweep(0x207387e40b0, 0xffffff00, 0x0)
        XXXX/other/go/src/runtime/mgcsweep.go:447 +0x473
runtime.(*mcentral).uncacheSpan(0x13ff350, 0x207387e40b0)
        XXXX/other/go/src/runtime/mcentral.go:214 +0xcc
runtime.(*mcache).releaseAll(0x20712ea0108)
        XXXX/other/go/src/runtime/mcache.go:276 +0x14b
runtime.(*mcache).prepareForSweep(0x20712ea0108)
        XXXX/other/go/src/runtime/mcache.go:310 +0x4d
runtime.acquirep(0xc000024000)
        XXXX/other/go/src/runtime/proc.go:4967 +0x45
runtime.stopm()
        XXXX/other/go/src/runtime/proc.go:2302 +0xbe
runtime.gcstopm()
        XXXX/other/go/src/runtime/proc.go:2551 +0xca
runtime.schedule()
        XXXX/other/go/src/runtime/proc.go:3118 +0x47d
runtime.goschedImpl(0xc000038000)
        XXXX/other/go/src/runtime/proc.go:3333 +0xf5
runtime.gopreempt_m(0xc000038000)
        XXXX/other/go/src/runtime/proc.go:3361 +0x3b
runtime.newstack()
        XXXX/other/go/src/runtime/stack.go:1045 +0x1cf
runtime.morestack()
        XXXX/other/go/src/runtime/asm_amd64.s:458 +0x97

goroutine 1 [runnable]:
strings.(*Builder).WriteString(...)
        XXXX/other/go/src/strings/builder.go:123
main.f(0xc000180000, 0xf4240, 0xf4240, 0xc000180000, 0xc00001c0b8)
        XXXX/test.go:28 +0x212
main.main()
        XXXX/test.go:36 +0x6a
exit status 2
mknyszek commented 3 years ago

I think I have a hunch why this happens. The alleged bad pointer in the original post is supposed to be a builder pointing to itself (note that the GC was currently scanning builderList). That extra self pointer is how it detects if it's been copied. Thing is, the copy check never fires because there's never a builder method executed after it is copied (otherwise it would fail).

So the question then is why the GC has any problems with this. I believe the answer is that bullder (sic) is stack-allocated, and the self pointer points into a stack location, however that pointer is copied to the heap. The GC has an invariant that heap objects can't point back into a stack, hence the failure.

The reason why this happens is the tricky noescape used in the strings.Builder self pointer assignment (the copyCheck method). Normally the compiler would force the strings.Builder to escape, so the invariant would be maintained.

As others have said, this is a valid failure mode for copying a strings.Builder. It turns out even if you don't use it, it's never safe to copy a non-zero strings.Builder. But, as @mpx points out, there's a UX issue here (and arguably a bug in strings.Builder), but I'm not sure how to fix it.

One possibility is to make the self pointer a uintptr. I think with care, it would actually make this API safe to copy but not use. It would still be a useful marker for the API, but the GC wouldn't ever try to look at that pointer. I'm almost positive that was already disregarded for a number of reasons back when this was implemented, one issue being the "safe to copy but not use the copy" restriction, another being that uintptr is even less safe, and yet another being that switching GC implementations could break this.

FTR, this is not something can be easily fixed in the runtime: we currently do not carry type information around for any object. Thus GC doesn't have access to type information, and even if it did, checking for strings.Builder and the invariant would add slow special cases to the scan path, the hottest path in the runtime.

I'm going to reopen this and update the title to indicate the UX issue here. I'll dig around too and see if I can find a duplicate.

EDIT: Technically the error message is accurate in the sense of "incorrect use of unsafe." :) The strings.Builder API is saying that it's the user's fault for not following the API, but the API lets you get into this situation. Who's at fault?

mknyszek commented 3 years ago

CC @griesemer @aclements

I think fixing the UX issue here would be nice, but I'm not sure how. Open to suggestions. I couldn't find a similar issue filed anywhere.

thanm commented 3 years ago

I wonder if it would be possible to write a vet check to warn against copying a "live" strings.Builder? We already warn when copying sync.Mutex as things stand.

randall77 commented 3 years ago

The other option is to fix #7921. Then we can remove the noescape annotation.

aclements commented 3 years ago

@mdempsky , any thoughts on fixing #7921 in the context of this issue?

mdempsky commented 3 years ago

I think fixing #7921 is doable for Go 1.18, but I wouldn't feel comfortable backporting it to 1.16/1.17.

bcmills commented 3 years ago

Another option would be to only perform the copyCheck in the first place in race mode, and drop the noescape hack from it. That would cause things to escape (and allocate) in race mode, but at least wouldn't produce invalid pointers in non-race mode.

That wouldn't detect memory corruption as readily in non-race mode, but if users observe possible memory corruption they should be testing in race mode anyway.

bcmills commented 3 years ago

Yet another option might be to use buf itself for the copy check, and make the check opportunistic rather than trying to flag exactly every bad write. That might look like:

type Builder struct {
    buf []byte
}

func (b *Builder) copyCheck() {
    i := len(buf)
    if i == cap(buf) {
        return  // Can't check for copying if the buffer is exactly full.
    }
    if b := buf[:cap(buf)][i]; b != 0 {
        // This builder has not written to buf[i] yet,
        // so if it is nonzero it must have been written through a (disallowed) copy.
        panic("strings: illegal use of non-zero Builder copied by value")
    }
}
bcmills commented 3 years ago

Or, we could even combine the above two approaches: use an opportunistic check via buf all the time, and a safe, fully-precise check (that incidentally causes the *Builder to escape) only in race mode.

guodongli-google commented 3 years ago

I did a preliminary experiment of extending the "copylock" checker to check for the copying of "strings.Builder", e.g.

func lockPath(tpkg *types.Package, typ types.Type) typePath {
    ...
    if named, ok := typ.(*types.Named); ok &&
        named.Obj().Name() == "Builder" &&
        named.Obj().Pkg().Path() == "strings" {
        return []types.Type{typ}
    }

Running this extension of a few open-source cloud projects doesn't generate any finding. This may indicate that its frequency is too low to be a vet checker.

BTW, here are the unit tests:

func BadCopy() {
    var x *strings.Builder
    p := x
    var y strings.Builder
    _ = y  // want `assignment copies value to _: strings.Builder`
    p = &y
    *p = *x  // want `assignment copies value to \*p: *strings.Builder`

    w := struct{ L strings.Builder }{
        L: *x, // want `literal copies value from \*x: strings.Builder`
    }
    print(w)  // want `call of print copies value: struct{L strings.Builder} contains strings.Builder`

    builderList := []strings.Builder{}
    bullder := strings.Builder{}
    builderList = append(builderList, bullder)  // want `call of append copies value: strings.Builder`
}
bcmills commented 8 months ago

See previously:

ianlancetaylor commented 3 months ago

Adding to the 1.24 milestone because this is a pretty bad failure mode. If nothing else we could at least get "go vet" to complain about this, much as it does for copying a sync.Mutex.