golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.14k stars 17.69k forks source link

gccgo, runtime: panic with checkdead: find g 19 in status 1 #11406

Closed laboger closed 9 years ago

laboger commented 9 years ago

Using gccgo from latest gcc5 branch. Building Docker from their master branch on ppc64le trying to provide continuous integration. This had been working without issue up until sometime in the past 2 weeks, where there is a failure during the Docker build. This same failure occurs when running one of the Docker unit tests after we've built Docker this way.

Failure occurs when building on Ubuntu 14.04 and 14.10 ppc64le.

Here is and example of the stack information for the Docker build failure:

runtime: checkdead: find g 19 in status 1 fatal error: checkdead: runnable g

runtime stack: runtime_dopanic ../../../src/libgo/runtime/panic.c:135 runtime_throw ../../../src/libgo/runtime/panic.c:193 checkdead ../../../src/libgo/runtime/proc.c:2866 mput ../../../src/libgo/runtime/proc.c:3113 stopm ../../../src/libgo/runtime/proc.c:1456 findrunnable ../../../src/libgo/runtime/proc.c:1764 schedule ../../../src/libgo/runtime/proc.c:1847 park0 ../../../src/libgo/runtime/proc.c:1912 runtime_mstart ../../../src/libgo/runtime/proc.c:1076

goroutine 16 [chan receive]: main.mainDaemon /go/src/github.com/docker/docker/docker/daemon.go:186 main.main /go/src/github.com/docker/docker/docker/docker.go:99 created by main ../../../src/libgo/runtime/go-main.c:48

goroutine 18 [finalizer wait]: created by runtime_createfing ../../../src/libgo/runtime/mgc0.c:2572

goroutine 19 [runnable]: syscall.Exitsyscall ../../../src/libgo/runtime/proc.c:2122 os_signal.signal_recv ../../../src/libgo/runtime/sigqueue.goc:110 signal.loop ../../../src/libgo/go/os/signal/signal_unix.go:21 created by os_signal..import ../../../src/libgo/go/os/signal/signal_unix.go:25

goroutine 25 [chan receive]: github_com_docker_docker_api_server.ServeApi.pN42_github_com_docker_docker_api_server.Server /go/src/github.com/docker/docker/api/server/server.go:110 main.$nested2 /go/src/github.com/docker/docker/docker/daemon.go:136 created by main.mainDaemon /go/src/github.com/docker/docker/docker/daemon.go:135

goroutine 26 [chan receive]: daemon.$nested20 /go/src/github.com/docker/docker/daemon/debugtrap.go:17 created by daemon.setupSigusr1Trap /go/src/github.com/docker/docker/daemon/debugtrap.go:16

goroutine 63 [chan receive]: database_sql.connectionOpener.pN15_database_sql.DB ../../../src/libgo/go/database/sql/sql.go:589

We can also generate a failure in one of the unit tests with the same message, and the stack trace is not quite the same:

runtime stack: runtime_dopanic ../../../src/libgo/runtime/panic.c:135 runtime_throw ../../../src/libgo/runtime/panic.c:193 checkdead ../../../src/libgo/runtime/proc.c:2866 mput ../../../src/libgo/runtime/proc.c:3113 stopm ../../../src/libgo/runtime/proc.c:1456 exitsyscall0 ../../../src/libgo/runtime/proc.c:2205 runtime_mstart ../../../src/libgo/runtime/proc.c:1076

goroutine 16 [chan receive]: testing.RunTests ../../../src/libgo/go/testing/testing.go:556 testing.Run.pN9_testing.M ../../../src/libgo/go/testing/testing.go:485 main.main /tmp/go-build271522865/github.com/docker/docker/pkg/archive/_test/_testmain.go:180 created by main ../../../src/libgo/runtime/go-main.c:48

goroutine 18 [finalizer wait]: created by runtime_createfing ../../../src/libgo/runtime/mgc0.c:2572

goroutine 73 [runnable]: syscall.Exitsyscall ../../../src/libgo/runtime/proc.c:2122 syscall.Getdents ../../../src/libgo/go/syscall/libcall_linux.go:236 syscall.ReadDirent ../../../src/libgo/go/syscall/libcall_linux.go:250 archive.readdirnames /go/src/github.com/docker/docker/pkg/archive/changes_linux.go:236 github_com_docker_docker_pkg_archive.walk.pN43_github_com_docker_docker_pkg_archive.walker /go/src/github.com/docker/docker/pkg/archive/changes_linux.go:120 archive.collectFileInfoForChanges /go/src/github.com/docker/docker/pkg/archive/changes_linux.go:53 github_com_docker_docker_pkg_archive.ChangesDirs /go/src/github.com/docker/docker/pkg/archive/changes.go:309 archive.checkNoChanges /go/src/github.com/docker/docker/pkg/archive/archive_test.go:553 github_com_docker_docker_pkg_archive.TestTarFiles /go/src/github.com/docker/docker/pkg/archive/archive_test.go:521 testing.tRunner ../../../src/libgo/go/testing/testing.go:447 created by testing.RunTests ../../../src/libgo/go/testing/testing.go:555

goroutine 74 [runnable]: sync.runtime_Syncsemacquire ../../../src/libgo/runtime/sema.goc:262 sync.Wait.pN9_sync.Cond ../../../src/libgo/go/sync/cond.go:62 io.write.pN7_io.pipe ../../../src/libgo/go/io/pipe.go:94 io.Write.pN13_io.PipeWriter ../../../src/libgo/go/io/pipe.go:161 bufio.flush.pN12_bufio.Writer ../../../src/libgo/go/bufio/bufio.go:530 bufio.Flush.pN12_bufio.Writer ../../../src/libgo/go/bufio/bufio.go:519 pools.$nested3 /go/src/github.com/docker/docker/pkg/pools/pools.go:102 github_com_docker_docker_pkg_ioutils.Close.pN55_github_com_docker_docker_pkg_ioutils.writeCloserWrapper /go/src/github.com/docker/docker/pkg/ioutils/writers.go:31 archive.$nested0 /go/src/github.com/docker/docker/pkg/archive/archive.go:476 created by github_com_docker_docker_pkg_archive.TarWithOptions /go/src/github.com/docker/docker/pkg/archive/archive.go:393

Tests failed: ./pkg/archive

Interesting that in both cases there is a stack marked as runnable with syscall.Exitsyscall at the top.

I have been looking at the code in checkdead where the message is coming from. In order for this to correctly detect a deadlock, the counts found in the runtime_sched structure must be in sync (i.e. must reflect the same state) with the entries runtime_allg list and I don't see how that is enforced. Seems like the run count could be computed but before the runtime_allg list is processed, a new goroutine could be added, resulting in a new run count, identifying a deadlock when there isn't one.

   run = runtime_sched.mcount - runtime_sched.nmidle - runtime_sched.nmidlelocked - 1 - countextra();
    if(run > 0)
            return;
    // If we are dying because of a signal caught on an already idle thread,
    // freezetheworld will cause all running threads to block.
    // And runtime will essentially enter into deadlock state,
    // except that there is a thread that will call runtime_exit soon.
    if(runtime_panicking > 0)
            return;
    if(run < 0) {
            runtime_printf("runtime: checkdead: nmidle=%d nmidlelocked=%d mcount=%d\n",
                    runtime_sched.nmidle, runtime_sched.nmidlelocked, runtime_sched.mcount);
            runtime_throw("checkdead: inconsistent counts");
    }
    grunning = 0;
    runtime_lock(&allglock);
    for(i = 0; i < runtime_allglen; i++) {
            gp = runtime_allg[i];
            if(gp->isbackground)
                    continue;
            s = gp->status;
            if(s == Gwaiting)
                    grunning++;
            else if(s == Grunnable || s == Grunning || s == Gsyscall) {
                    runtime_unlock(&allglock);
                    runtime_printf("runtime: checkdead: find g %D in status %d\n", gp->goid, s);
                    runtime_throw("checkdead: runnable g");
            }
    }
    runtime_unlock(&allglock);

We will continue to try to reproduce this problem in a smaller testcase and do some more debugging.

laboger commented 9 years ago

We have a testcase that reproduces the problem, which shows that it is due to the Entersyscall and Exitsyscall within libgo/go/syscall/libcall_linux.go function Getdents. It should not be in Getdents, since Entersyscall and Exitsyscall are called within the call to Syscall(). Calling Exitsyscall on this path is what causes the panic in checkdead.

I have a patch to fix this here: https://go-review.googlesource.com/#/c/11761/

I built did a bootstrap build on ppc64le and tested it.

Testcase to reproduce the problem with gccgo from gcc5 branch on both ppc64le and x86 is below. You must do mkdir temp to cause the panic to occur:

package main

import (
        "fmt"
        "os"
        "sort"
        "syscall"
        "unsafe"
)

type nameIno struct {
        name string
        ino  uint64
}

type nameInoSlice []nameIno

func (s nameInoSlice) Len() int           { return len(s) }
func (s nameInoSlice) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }
func (s nameInoSlice) Less(i, j int) bool { return s[i].name < s[j].name }

func readdirnames(dirname string) (names []nameIno, err error) {
        var (
                size = 100
                buf  = make([]byte, 4096)
                nbuf int
                bufp int
                nb   int
        )

        f, err := os.Open(dirname)
        if err != nil {
                return nil, err
        }
        defer f.Close()

        names = make([]nameIno, 0, size) // Empty with room to grow.
        for {
                // Refill the buffer if necessary
                if bufp >= nbuf {
                        bufp = 0
                        nbuf, err = syscall.ReadDirent(int(f.Fd()), buf) // getdents on linux
                        if nbuf < 0 {
                                nbuf = 0
                        }
                        if err != nil {
                                return nil, os.NewSyscallError("readdirent", err)
                        }
                        if nbuf <= 0 {
                                break // EOF
                        }
                }

                // Drain the buffer
                nb, names = parseDirent(buf[bufp:nbuf], names)
                bufp += nb
        }

        sl := nameInoSlice(names)
        sort.Sort(sl)
        return sl, nil
}

// parseDirent is a minor modification of syscall.ParseDirent (linux version)
// which returns {name,inode} pairs instead of just names.
func parseDirent(buf []byte, names []nameIno) (consumed int, newnames []nameIno) {
        origlen := len(buf)
        for len(buf) > 0 {
                dirent := (*syscall.Dirent)(unsafe.Pointer(&buf[0]))
                buf = buf[dirent.Reclen:]
                if dirent.Ino == 0 { // File absent in directory.
                        continue
                }
                bytes := (*[10000]byte)(unsafe.Pointer(&dirent.Name[0]))
                var name = string(bytes[0:clen(bytes[:])])
                if name == "." || name == ".." { // Useless names
                        continue
                }
                names = append(names, nameIno{name, dirent.Ino})
        }
        return origlen - len(buf), names
}

func clen(n []byte) int {
        for i := 0; i < len(n); i++ {
                if n[i] == 0 {
                        return i
                }
        }
        return len(n)
}

func main() {
        names, _ := readdirnames("temp/")
        fmt.Println("list %s\n", names)
} 
laboger commented 9 years ago

I did not intend to close it, sorry.

gopherbot commented 9 years ago

CL https://golang.org/cl/11761 mentions this issue.

ianlancetaylor commented 9 years ago

Thanks for sorting this out. Your patch has been committed to GCC mainline and GCC 4.9 branch. Waiting for RM approval for GCC 5 branch.

ianlancetaylor commented 9 years ago

Also committed to GCC 5 branch.