Open object88 opened 6 years ago
/cc @dvyukov
I guess we could use the standard printing routine for the first stack. Race runtime would need to call back to ask for the whole current stack trace.
I can't speak to the breadth of use cases, of course, but for the circumstance that I noticed this, the parameters would only be helpful if provided for both stacks in the conflict. In my particular case, I wanted to know if the calls into my processComplete
were invoked with different parameters; it would tell me if the problem was that I was mismanaging the timing around the calls to processComplete
, or whether I was mismanaging the shared objects involved in that call.
Aside, I am curious about how the race detector is implemented, but I didn't find the text "DATA RACE" in github.com/golang/go. Is there another repository that this lives in?
@object88 thank you for the report and for asking about the runtime race detector. You might find this useful https://github.com/golang/go/blob/00587e898802bbceb5e86453245f232488ad2356/src/runtime/race/README#L1-L7
and as the doc there says, please take a look at https://github.com/golang/build/tree/master/cmd/racebuild aka package golang.org/x/build/cmd/racebuild
Providing arguments for both stacks is infeasible. Race detector memorizes stacks for all memory accesses ever happened in the execution (not possible to know ahead of time if any of them will lead to a race in future or not). Currently race detector spends 8 bytes per stack, and that's still lots of memory overall. Memorizing several hundreds of bytes per memory access is infeasible unfortunately.
Out of curiosity, is that 8 bytes per stack, or 8 bytes per stack frame?
Regardless, I hear that there's a very large memory cost involved. Rather than add to the cost of maintaining the parameters, etc., is there a feasible way (generally speaking, not necessarily using what currently exists) to read the stack from another goroutine?
Out of curiosity, is that 8 bytes per stack, or 8 bytes per stack frame?
8 bytes per full stack trace.
is there a feasible way (generally speaking, not necessarily using what currently exists) to read the stack from another goroutine?
runtime.GoroutineProfile does this. But note that in our case the other goroutine is already in a completely different stack that is unrelated to the race. So I don't see how it can help here.
But note that in our case the other goroutine is already in a completely different stack that is unrelated to the race.
Do you mean that the race detector does not cause the program to come to a complete pause across all goroutines, so that even if we were to inspect the other goroutine, it's in all likelihood have changed state to such a degree that we may not get any useful information, or at least not with any consistency?
Generally there is no point in time when both racing goroutines are at these stacks at the same time, so it does not matter if it pauses or not.
I'm a little confused; I wouldn't expect the racing goroutines to have identical stacks. What I have observed is that while they have the same small number of initial stack frames, the execution has diverged at the time of the race. I didn't include the whole data race dump initially for brevity's sake, so perhaps I haven't been clear. Here is a full example:
==================
WARNING: DATA RACE
Write at 0x00c441f6d200 by goroutine 145:
runtime.mapassign_faststr()
/usr/local/Cellar/go/1.9.3/libexec/src/runtime/hashmap_fast.go:774 +0x0
go/types.(*Scope).Insert()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/scope.go:106 +0x136
go/types.(*Checker).declare()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/decl.go:28 +0x98
go/types.(*Checker).declarePkgObj()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/resolver.go:114 +0x144
go/types.(*Checker).collectObjects()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/resolver.go:383 +0x9e2
go/types.(*Checker).checkFiles()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/check.go:237 +0xae
go/types.(*Checker).Files()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/check.go:230 +0x56
github.com/object88/langd.(*Loader).processComplete()
/Users/bropa18/work/src/github.com/object88/langd/loader.go:452 +0x813
github.com/object88/langd.(*Loader).processStateChange()
/Users/bropa18/work/src/github.com/object88/langd/loader.go:360 +0xa7a
Previous read at 0x00c441f6d200 by goroutine 46:
runtime.mapaccess1_faststr()
/usr/local/Cellar/go/1.9.3/libexec/src/runtime/hashmap_fast.go:208 +0x0
go/types.(*Checker).selector()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/scope.go:71 +0x1af1
go/types.(*Checker).typExprInternal()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/typexpr.go:234 +0xed4
go/types.(*Checker).typExpr()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/typexpr.go:132 +0x96
go/types.(*Checker).typ()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/typexpr.go:140 +0x74
go/types.(*Checker).typExprInternal()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/typexpr.go:276 +0xd1e
go/types.(*Checker).typExpr()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/typexpr.go:132 +0x96
go/types.(*Checker).typ()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/typexpr.go:140 +0x74
go/types.(*Checker).varDecl()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/decl.go:143 +0x51f
go/types.(*Checker).declStmt()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/decl.go:416 +0xcaf
go/types.(*Checker).stmt()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/stmt.go:307 +0x5c83
go/types.(*Checker).stmtList()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/stmt.go:110 +0x95
go/types.(*Checker).funcBody()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/stmt.go:42 +0x2f4
go/types.(*Checker).functionBodies()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/resolver.go:476 +0x122
go/types.(*Checker).checkFiles()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/check.go:241 +0xd8
go/types.(*Checker).Files()
/usr/local/Cellar/go/1.9.3/libexec/src/go/types/check.go:230 +0x56
github.com/object88/langd.(*Loader).processComplete()
/Users/bropa18/work/src/github.com/object88/langd/loader.go:452 +0x813
github.com/object88/langd.(*Loader).processStateChange()
/Users/bropa18/work/src/github.com/object88/langd/loader.go:360 +0xa7a
Goroutine 145 (running) created at:
github.com/object88/langd.(*Loader).Start.func1()
/Users/bropa18/work/src/github.com/object88/langd/loader.go:242 +0x197
Goroutine 46 (running) created at:
github.com/object88/langd.(*Loader).Start.func1()
/Users/bropa18/work/src/github.com/object88/langd/loader.go:242 +0x197
==================
@dvyukov, is there a way to run the program so it crashes at the point where the race is detected? Then at least someone can run under a debugger and get a stack trace for the live half of the race.
GORACE="abort_on_error=1 halt_on_error=1"
halts execution immediately. However, runtime seems to fail to unwind race runtime and does not show the racy stack:
==================
WARNING: DATA RACE
Read at 0x00c42014e018 by goroutine 7:
runtime.slicecopy()
src/runtime/slice.go:192 +0x0
command-line-arguments_test.TestRaceSliceStruct.func1()
src/runtime/race/testdata/mop_test.go:1749 +0xae
Previous write at 0x00c42014e018 by goroutine 6:
command-line-arguments_test.TestRaceSliceStruct()
src/runtime/race/testdata/mop_test.go:1752 +0xda
testing.tRunner()
src/testing/testing.go:777 +0x16d
Goroutine 7 (running) created at:
command-line-arguments_test.TestRaceSliceStruct()
src/runtime/race/testdata/mop_test.go:1747 +0xbd
testing.tRunner()
src/testing/testing.go:777 +0x16d
Goroutine 6 (running) created at:
testing.(*T).Run()
src/testing/testing.go:824 +0x564
testing.runTests.func1()
src/testing/testing.go:1063 +0xa4
testing.tRunner()
src/testing/testing.go:777 +0x16d
testing.runTests()
src/testing/testing.go:1061 +0x4e1
testing.(*M).Run()
src/testing/testing.go:978 +0x2cd
main.main()
_testmain.go:298 +0x22a
==================
SIGABRT: abort
PC=0x7f7bb8e8ac37 m=0 sigcode=18446744073709551610
goroutine 36 [running]:
runtime: unknown pc 0x7f7bb8e8ac37
stack: frame={sp:0x7fff901b1118, fp:0x0} stack=[0xc42010a800,0xc42010b000)
runtime: unknown pc 0x7f7bb8e8ac37
stack: frame={sp:0x7fff901b1118, fp:0x0} stack=[0xc42010a800,0xc42010b000)
created by command-line-arguments_test.TestRaceSliceStruct
src/runtime/race/testdata/mop_test.go:1747 +0xbe
goroutine 1 [chan receive]:
testing.(*T).Run(0xc42014a000, 0x5de902, 0x13, 0x5e5b08, 0xc4200bdc01)
src/testing/testing.go:825 +0x597
testing.runTests.func1(0xc42014a000)
src/testing/testing.go:1063 +0xa5
testing.tRunner(0xc42014a000, 0xc4200bddb0)
src/testing/testing.go:777 +0x16e
testing.runTests(0xc420110040, 0x6d22e0, 0x81, 0x81, 0xc420132000)
src/testing/testing.go:1061 +0x4e2
testing.(*M).Run(0xc420132000, 0x0)
src/testing/testing.go:978 +0x2ce
main.main()
_testmain.go:298 +0x22b
goroutine 35 [chan receive]:
command-line-arguments_test.TestRaceSliceStruct(0xc42014a0f0)
src/runtime/race/testdata/mop_test.go:1753 +0xff
testing.tRunner(0xc42014a0f0, 0x5e5b08)
src/testing/testing.go:777 +0x16e
created by testing.(*T).Run
src/testing/testing.go:824 +0x565
I wouldn't expect the racing goroutines to have identical stacks.
They don't. Did I say otherwise somewhere?
No, I misunderstood, my apologies.
It seems like we should:
GORACE="abort_on_error=1 halt_on_error=1"
.I would like to participate in this work, even if it is just documentation; what's a good way to go about contributing?
The docs for -race at https://golang.org/cmd/go point to https://golang.org/doc/articles/race_detector.html, which does in fact document halt_on_error
but does not document abort_on_error
. We should document abort_on_error
in doc/articles/race_detector.html in the Go repo.
We should also make sure that it reports the correct stack trace; above, @dvyukov says that it does not. I don't know what's involved there.
For general contribution instructions, see https://golang.org/doc/contribute.html.
Thanks.
We should also make sure that it reports the correct stack trace; above, @dvyukov says that it does not. I don't know what's involved there.
Race runtime raises SIGABRT inside of itself on partially switched Go stack (see comment on runtime·racecallbackthunk). It maybe hard to unwind from there. The simplest way to achieve this would probably be to call back into Go runtime from race runtime, and then Go runtime can print perfectly valid stack and terminate. However, updating race runtime involves rebuilding all syso files, but last time I looked racebuild utility was broken (something with gomote). I wrote something about this somewhere, but did not hear back as far as I remember. I don't know what's the state of racebuild now.
@dvyukov Aside from the fact that the tooling is currently broken, are there any fixes in the upstream race detector that we would want to import or is it relatively stable?
Not urgent, but it would be nice to pull in the fix for #20139.
Nothing particularly important. I forgot about #20139, but, yes, would be nice to pull for some workloads. But generally I used to update race runtime once per release to keep things under control, there are usually some issues accumulate over time. There is now also #23731 and it would be nice to separate general race runtime update from ppc work.
This is not so much a bug, as an ask.
When using the
-race
option forgo test
, I would like to see the arguments that are provided at the call site, down the stack, when a data race is detected. Currently, I only see the methods themselves. If I omit the-race
option, I can see the data race as reported by a map concurrency problem, and this does include call parameters.In my particular example, I am having trouble with a map that the
go/types#Checker
method uses. The concurrency issue is happening deep in the Go code, rather than something that I have immediate control over. (Note: I do not think that this is a problem in the Go type checker library at this point; no reason to. I'm sure I'm doing something wrong on my side.) I have many goroutines running a type checker at the same time. If I had parameters on the call stack, I could tell a lot more about the situation that causes the issue.What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
I have not tried with the 1.10 beta.
What operating system and processor architecture are you using (
go env
)?What did you do?
Unfortunately, I don't have a small working example of this scenario. I found this while debugging a race condition in my... um, complex program. I can try to create a more minimal example at a later date, if the team thinks it's helpful.
What did you expect to see?
I would like to see the parameters in the call stack. This example from the map's concurrency checker is a great example:
In particular, I can see the parameters to my
processComplete
method:By seeing the parameters, I can see whether or not my
processComplete
method is getting called with the same parameters simultaneously.What did you see instead?
I can observe that
processComplete
was invoked, but it's missing context:Without the parameter information, I don't know whether or not the same data is being passed into the method.