golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.04k stars 17.67k forks source link

runtime: "fatal: systemstack called from unexpected goroutine" on Android #51001

Closed bcmills closed 1 year ago

bcmills commented 2 years ago
#!watchflakes
post <- builder ~ `android` && `systemstack called from unexpected goroutine`

greplogs --dashboard -md -l -e '^fatal: systemstack called from unexpected goroutine' --since=2021-01-01

2022-02-02T21:12:39-53d6a72/android-amd64-emu

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
fatal: systemstack called from unexpected goroutineTrap 
exitcode=133
FAIL    runtime 59.879s

2021-10-08T16:26:20-59d4e92-99c1b24/android-amd64-emu

fatal: systemstack called from unexpected goroutineSegmentation fault 
exitcode=139FAIL    golang.org/x/net/publicsuffix   3.419s

I'll also note that badsystemstackMsg seems to be missing a final newline as of CL 93659 (CC @aclements @randall77). 😅

bcmills commented 2 years ago

This happened in a TryBot in https://storage.googleapis.com/go-build-log/f1e11825/android-amd64-emu_262486a5.log:

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
fatal: systemstack called from unexpected goroutineTrap 
exitcode=133
FAIL    runtime 16.177s
FAIL
2022/05/03 14:18:09 Failed: exit status 1
go tool dist: FAILED

Marking as release-blocker because this affects TryBot runs. Since android/amd64 is not a first-class port, either the underlying bug can be diagnosed and fixed, or the builder can be removed from the default TryBot set. (I'll leave that choice up to @golang/runtime to decide and implement.)

bcmills commented 2 years ago

This may or may not be OS-specific. There is another failure in the builder logs since February, but on plan9 rather than android; it isn't obvious to me whether that is an independent bug.

greplogs -l -e 'fatal: systemstack called from unexpected goroutine' --since=2022-02-03 2022-03-05T21:20:16-e155b03-45f4544/plan9-amd64-0intro

bcmills commented 2 years ago

greplogs -l -e 'fatal: systemstack called from unexpected goroutine' --since=2022-03-06 2022-05-03T19:48:07-bccce90/android-arm64-corellium

mknyszek commented 2 years ago

@golang/runtime This is a second class port, but because it's a trybot, this is a release blocker. Should we consider removing this as a trybot? Is that bringing us enough value?

gopherbot commented 2 years ago

Change https://go.dev/cl/407615 mentions this issue: dashboard: remove android-amd64-emu from main go repo's TryBot set

dmitshur commented 2 years ago

I've mailed CL 407615 that makes android-amd64-emu a post-submit builder only (in the main repo) while investigation of this issue is underway. If submitted, this issue can be unmarked as a release-blocker for Go 1.19.

bcmills commented 2 years ago

Curiously, this does not appear to be arch-specific: we've seen these failures on both amd64 and arm64.

greplogs -l -e 'fatal: systemstack called from unexpected goroutine' --since=2022-05-04 2022-05-20T22:30:37-2b0e457/android-arm64-corellium

prattmic commented 2 years ago

The first failure shows exitcode=133. This is likely bash parlance for exiting with signal 5 (SIGTRAP). From man bash: The return value of a simple command is its exit status, or 128+n if the command is terminated by signal n.

If I recall correctly, Android applies a seccomp syscall filter to (all?) processes. I wonder if we are violating this filter on the throw path, resulting in truncation of the stack trace. seccomp with mode SECCOMP_RET_TRAP sends a SIGTRAP on violation.

prattmic commented 2 years ago

@golang/android do you know if the Android seccomp filters apply to processes on our builders, and if so which one?

prattmic commented 2 years ago

No repros of this on 25 gomotes all weekend. I did find #53250, plus several no context SIGSEGVs in the runtime test, like:

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
Segmentation fault 
exitcode=139
FAIL»...runtime»19.914s
FAIL
2022/06/05 22:34:10 Failed: exit status 1

(Some where in the standard runtime test rather the -cpu variant)

aclements commented 2 years ago

This isn't a first-class port, so dropping release-blocker.

bcmills commented 2 years ago

This isn't a first-class port, so dropping release-blocker.

This port is still run as a default TryBot until/unless CL 407615 is merged. IMO known failures on TryBots should still block releases, since they still add testing noise for anyone who uses TryBots on a pending change.

bcmills commented 2 years ago

In the interest of decoupling this issue from the Android TryBots in general, I've filed #53377 (as a release-blocker) to decide whether to remove the TryBots or fix their known failure modes.

bcmills commented 2 years ago

Summarizing the known failures with this pattern on Android:

greplogs -l -e '(?ms)\Aandroid-.*^fatal: systemstack called from unexpected goroutine' 2022-05-20T22:30:37-2b0e457/android-arm64-corellium 2022-05-03T19:48:07-bccce90/android-arm64-corellium 2022-02-02T21:12:39-53d6a72/android-amd64-emu 2021-10-08T16:26:20-59d4e92-99c1b24/android-amd64-emu

So it looks like this bug was probably introduced sometime in 2021..? (Or else, maybe the check itself was introduced then? 😅)

gopherbot commented 2 years ago

Change https://go.dev/cl/412174 mentions this issue: dashboard: add known issues for android-*-emu

ianlancetaylor commented 2 years ago

Rolling forward to 1.20.

heschi commented 2 years ago

2022-08-25T19:17:14-f64f12f/android-arm64-corellium 2022-08-22T14:48:53-6bdca82/android-arm64-corellium

gopherbot commented 2 years ago

Found new dashboard test flakes for:

#!watchflakes
post <- builder ~ `android` && `systemstack called from unexpected goroutine`
2022-08-22 14:48 android-arm64-corellium go@6bdca820 runtime (log) fatal: systemstack called from unexpected goroutine
2022-08-25 19:17 android-arm64-corellium go@f64f12f0 runtime (log) fatal: systemstack called from unexpected goroutine
2022-09-27 18:26 android-arm64-corellium go@17078f58 runtime (log) fatal: systemstack called from unexpected goroutine

— watchflakes

gopherbot commented 2 years ago

Found new dashboard test flakes for:

#!watchflakes
post <- builder ~ `android` && `systemstack called from unexpected goroutine`
2022-10-06 02:38 android-arm64-corellium go@2e054128 runtime (log) fatal: systemstack called from unexpected goroutine

— watchflakes

cherrymui commented 1 year ago

Seems no new failure for some time.

bcmills commented 1 year ago

Note that the rate of testing is much lower now because of the freeze. (6 months is a good window size for checking failure rates.)

bcmills commented 1 year ago

Still none after the tree reopened. Maybe fixed?

gopherbot commented 1 year ago

Change https://go.dev/cl/465156 mentions this issue: dashboard: unmark known-issues with low failure rates

gopherbot commented 1 year ago

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)