Open momotaro98 opened 3 years ago
The example:
The strack trace was generated from different code (it references git.rarejob.com/rarejob-platform/event-search-polling-batch/
and Elasticsearch). Does the example code fail for you?
I'd recommend running your code with go build -race
to try to detect races or unsafe pointer use. See the Go1.14 release notes for more details on checkptr, also go doc unsafe.Pointer
.
I can reproduce the crash with the snippet above on 1.16.5 if I set an higher GOMAXPROCS (like 4 or 8).
I also think the crash is caused by the fact that you're copying around non-empty strings.Builder
s, which the doc explicitly says it's not allowed. It's also true that the failure mode is a little obscure, but I'm not sure there's an easy way to report a better error.
I'm sorry the stack trace was generated from different code. I tried to show the logic of the code.
Thank you for your kind feedback.
@ALTree - can you please attach the panic to this issue? What platform?
I haven't been able to reproduce on Linux/amd64. The original report is for Darwin.
@momotaro98 - it looks like this might be a real problem since someone has been able to reproduce a crash with the example code.
This issue probably should be reopened.
Cc @mknyszek
it looks like this mighty be a real problem since someone has been able to reproduce a crash with the example code.
But the example code copies around strings.Builder
s values, which I think it's not allowed.
The runtime still shouldn't throw and crash. That indicates either the runtime, compiler, or strings.Builder
might need to be fixed.
@ALTree are you able to provide a failed stack trace from the example code? Which platform (or go env
)?
$ go version
go version go1.16.5 windows/amd64
$ GOMAXPROCS=8 go run test.go
runtime: marked free object in span 0x207387e40b0, elemsize=192 freeindex=30 (bad use of unsafe.Pointer? try -d=checkptr)
0xc001f30000 alloc unmarked
0xc001f300c0 alloc unmarked
0xc001f30180 alloc unmarked
0xc001f30240 alloc unmarked
0xc001f30300 alloc unmarked
0xc001f303c0 alloc unmarked
0xc001f30480 alloc unmarked
0xc001f30540 alloc unmarked
0xc001f30600 alloc unmarked
0xc001f306c0 alloc unmarked
0xc001f30780 alloc unmarked
0xc001f30840 alloc unmarked
0xc001f30900 alloc unmarked
0xc001f309c0 alloc unmarked
0xc001f30a80 alloc unmarked
0xc001f30b40 alloc unmarked
0xc001f30c00 alloc unmarked
0xc001f30cc0 alloc unmarked
0xc001f30d80 alloc unmarked
0xc001f30e40 alloc unmarked
0xc001f30f00 alloc unmarked
0xc001f30fc0 alloc unmarked
0xc001f31080 alloc unmarked
0xc001f31140 alloc unmarked
0xc001f31200 alloc unmarked
0xc001f312c0 alloc unmarked
0xc001f31380 alloc unmarked
0xc001f31440 alloc unmarked
0xc001f31500 alloc unmarked
0xc001f315c0 alloc marked
0xc001f31680 free unmarked
0xc001f31740 free unmarked
0xc001f31800 free unmarked
0xc001f318c0 free unmarked
0xc001f31980 free unmarked
0xc001f31a40 free unmarked
0xc001f31b00 free unmarked
0xc001f31bc0 free unmarked
0xc001f31c80 free unmarked
0xc001f31d40 free unmarked
0xc001f31e00 free unmarked
0xc001f31ec0 free marked zombie
000000c001f31ec0: 0000000000000000 0000000000000000
000000c001f31ed0: 0000000000000000 0000000000000000
000000c001f31ee0: 0000000000000000 0000000000000000
000000c001f31ef0: 0000000000000000 0000000000000000
000000c001f31f00: 0000000000000000 0000000000000000
000000c001f31f10: 0000000000000000 0000000000000000
000000c001f31f20: 0000000000000000 0000000000000000
000000c001f31f30: 0000000000000000 0000000000000000
000000c001f31f40: 0000000000000000 0000000000000000
000000c001f31f50: 0000000000000000 0000000000000000
000000c001f31f60: 0000000000000000 0000000000000000
000000c001f31f70: 0000000000000000 0000000000000000
fatal error: found pointer to free object
runtime stack:
runtime.throw(0x1341d26, 0x1c)
XXXX/other/go/src/runtime/panic.go:1117 +0x79
runtime.(*mspan).reportZombies(0x207387e40b0)
XXXX/other/go/src/runtime/mgcsweep.go:614 +0x385
runtime.(*mspan).sweep(0x207387e40b0, 0xffffff00, 0x0)
XXXX/other/go/src/runtime/mgcsweep.go:447 +0x473
runtime.(*mcentral).uncacheSpan(0x13ff350, 0x207387e40b0)
XXXX/other/go/src/runtime/mcentral.go:214 +0xcc
runtime.(*mcache).releaseAll(0x20712ea0108)
XXXX/other/go/src/runtime/mcache.go:276 +0x14b
runtime.(*mcache).prepareForSweep(0x20712ea0108)
XXXX/other/go/src/runtime/mcache.go:310 +0x4d
runtime.acquirep(0xc000024000)
XXXX/other/go/src/runtime/proc.go:4967 +0x45
runtime.stopm()
XXXX/other/go/src/runtime/proc.go:2302 +0xbe
runtime.gcstopm()
XXXX/other/go/src/runtime/proc.go:2551 +0xca
runtime.schedule()
XXXX/other/go/src/runtime/proc.go:3118 +0x47d
runtime.goschedImpl(0xc000038000)
XXXX/other/go/src/runtime/proc.go:3333 +0xf5
runtime.gopreempt_m(0xc000038000)
XXXX/other/go/src/runtime/proc.go:3361 +0x3b
runtime.newstack()
XXXX/other/go/src/runtime/stack.go:1045 +0x1cf
runtime.morestack()
XXXX/other/go/src/runtime/asm_amd64.s:458 +0x97
goroutine 1 [runnable]:
strings.(*Builder).WriteString(...)
XXXX/other/go/src/strings/builder.go:123
main.f(0xc000180000, 0xf4240, 0xf4240, 0xc000180000, 0xc00001c0b8)
XXXX/test.go:28 +0x212
main.main()
XXXX/test.go:36 +0x6a
exit status 2
I think I have a hunch why this happens. The alleged bad pointer in the original post is supposed to be a builder pointing to itself (note that the GC was currently scanning builderList
). That extra self pointer is how it detects if it's been copied. Thing is, the copy check never fires because there's never a builder method executed after it is copied (otherwise it would fail).
So the question then is why the GC has any problems with this. I believe the answer is that bullder
(sic) is stack-allocated, and the self pointer points into a stack location, however that pointer is copied to the heap. The GC has an invariant that heap objects can't point back into a stack, hence the failure.
The reason why this happens is the tricky noescape
used in the strings.Builder
self pointer assignment (the copyCheck
method). Normally the compiler would force the strings.Builder
to escape, so the invariant would be maintained.
As others have said, this is a valid failure mode for copying a strings.Builder
. It turns out even if you don't use it, it's never safe to copy a non-zero strings.Builder
. But, as @mpx points out, there's a UX issue here (and arguably a bug in strings.Builder
), but I'm not sure how to fix it.
One possibility is to make the self pointer a uintptr
. I think with care, it would actually make this API safe to copy but not use. It would still be a useful marker for the API, but the GC wouldn't ever try to look at that pointer. I'm almost positive that was already disregarded for a number of reasons back when this was implemented, one issue being the "safe to copy but not use the copy" restriction, another being that uintptr
is even less safe, and yet another being that switching GC implementations could break this.
FTR, this is not something can be easily fixed in the runtime: we currently do not carry type information around for any object. Thus GC doesn't have access to type information, and even if it did, checking for strings.Builder
and the invariant would add slow special cases to the scan path, the hottest path in the runtime.
I'm going to reopen this and update the title to indicate the UX issue here. I'll dig around too and see if I can find a duplicate.
EDIT: Technically the error message is accurate in the sense of "incorrect use of unsafe." :) The strings.Builder
API is saying that it's the user's fault for not following the API, but the API lets you get into this situation. Who's at fault?
CC @griesemer @aclements
I think fixing the UX issue here would be nice, but I'm not sure how. Open to suggestions. I couldn't find a similar issue filed anywhere.
I wonder if it would be possible to write a vet check to warn against copying a "live" strings.Builder? We already warn when copying sync.Mutex as things stand.
The other option is to fix #7921. Then we can remove the noescape
annotation.
@mdempsky , any thoughts on fixing #7921 in the context of this issue?
I think fixing #7921 is doable for Go 1.18, but I wouldn't feel comfortable backporting it to 1.16/1.17.
Another option would be to only perform the copyCheck
in the first place in race
mode, and drop the noescape
hack from it. That would cause things to escape (and allocate) in race
mode, but at least wouldn't produce invalid pointers in non-race
mode.
That wouldn't detect memory corruption as readily in non-race
mode, but if users observe possible memory corruption they should be testing in race
mode anyway.
Yet another option might be to use buf
itself for the copy check, and make the check opportunistic rather than trying to flag exactly every bad write. That might look like:
type Builder struct {
buf []byte
}
func (b *Builder) copyCheck() {
i := len(buf)
if i == cap(buf) {
return // Can't check for copying if the buffer is exactly full.
}
if b := buf[:cap(buf)][i]; b != 0 {
// This builder has not written to buf[i] yet,
// so if it is nonzero it must have been written through a (disallowed) copy.
panic("strings: illegal use of non-zero Builder copied by value")
}
}
Or, we could even combine the above two approaches: use an opportunistic check via buf
all the time, and a safe, fully-precise check (that incidentally causes the *Builder
to escape) only in race
mode.
I did a preliminary experiment of extending the "copylock" checker to check for the copying of "strings.Builder", e.g.
func lockPath(tpkg *types.Package, typ types.Type) typePath {
...
if named, ok := typ.(*types.Named); ok &&
named.Obj().Name() == "Builder" &&
named.Obj().Pkg().Path() == "strings" {
return []types.Type{typ}
}
Running this extension of a few open-source cloud projects doesn't generate any finding. This may indicate that its frequency is too low to be a vet checker.
BTW, here are the unit tests:
func BadCopy() {
var x *strings.Builder
p := x
var y strings.Builder
_ = y // want `assignment copies value to _: strings.Builder`
p = &y
*p = *x // want `assignment copies value to \*p: *strings.Builder`
w := struct{ L strings.Builder }{
L: *x, // want `literal copies value from \*x: strings.Builder`
}
print(w) // want `call of print copies value: struct{L strings.Builder} contains strings.Builder`
builderList := []strings.Builder{}
bullder := strings.Builder{}
builderList = append(builderList, bullder) // want `call of append copies value: strings.Builder`
}
See previously:
Adding to the 1.24 milestone because this is a pretty bad failure mode. If nothing else we could at least get "go vet" to complain about this, much as it does for copying a sync.Mutex
.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
https://play.golang.org/p/RRI3-srzVrR
What did you expect to see?
I don't see any error message
What did you see instead?