golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.41k stars 17.7k forks source link

x/tools/go/packages: Load is 2-3 orders of magnitude slower than go/build.Import #31087

Closed thockin closed 2 years ago

thockin commented 5 years ago

As per @heschik in #29452

Converting a tool from go/build to go/packages incurs a huge slowdown. I tried to min-repro it here. Note that we have a full vendor/ dir, no un-vendored deps, and modules are NOT turned on yet.

What version of Go are you using (go version)?

1.12.1

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

linux, amd64

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/thockin/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/usr/local/google/home/thockin/src/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/google-golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/google-golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build445939239=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Build and run the two programs below and time them.

What did you expect to see?

Roughly the same timing

What did you see instead?

0.008s vs 1.272s

go-build.go:

package main

import (
        "fmt"
        "go/build"
        "log"
        "os"

        "github.com/davecgh/go-spew/spew"
)

func main() {
        if err := visit("."); err != nil {
                fmt.Println(err)
                os.Exit(1)
        }
}

func visit(pkgName string) error {
        pkg, err := findPackage(pkgName)
        if err != nil {
                return err
        }
        spew := spew.ConfigState{DisableMethods: true, Indent: "  "}
        spew.Dump(pkg)
        return nil
}

func findPackage(pkgName string) (*build.Package, error) {
        log.Println("find", pkgName)
        pkg, err := build.Import(pkgName, getwd(), build.ImportComment)
        if err != nil {
                return nil, err
        }
        return pkg, nil
}

func getwd() string {
        pwd, err := os.Getwd()
        if err != nil {
                panic(fmt.Sprintf("can't get working directory: %v", err))
        }
        return pwd
}

go-packages.go:

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/davecgh/go-spew/spew"

    "golang.org/x/tools/go/packages"
)

func main() {
    if err := visit("."); err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
}

func visit(pkgName string) error {
    pkgs, err := findPackage(pkgName)
    if err != nil {
        return err
    }
    spew := spew.ConfigState{DisableMethods: true, Indent: "  "}
    spew.Dump(pkgs[0])
    return nil
}

func findPackage(pkgName string) ([]*packages.Package, error) {
    log.Println("find", pkgName)
    pkgs, err := packages.Load(&packages.Config{Mode: packages.LoadFiles}, pkgName)
    if err != nil {
        return nil, err
    }
    return pkgs, nil
}

Testing:

$ ls
go-build.go  go-packages.go

$ go build ./go-build.go 

$ go build ./go-packages.go 

$ time ./go-build >/dev/null
2019/03/27 11:51:10 find .

real    0m0.008s
user    0m0.008s
sys 0m0.000s

$ time ./go-packages >/dev/null
2019/03/27 11:51:17 find .

real    0m1.272s
user    0m0.094s
sys 0m0.109s
matloob commented 5 years ago

This was discussed offline: The slowness was due to the Google-specific gopackagesdriver binary.

Unfortunately go/packages will always be slower than go/build because we need to execute go list as the true authority of package information, but this kind of slowness is unexpected and really bad. We'll get a google-specific fix in soon, but for those of you facing this problem, you can set GOPACKAGESDRIVER=off in the short term to bypass that driver.

thockin commented 5 years ago

Thanks. I am still concerned at "go/packages will always be slower than go/build" even when we have vendored EVERYTHING. I will hang tight for the followup here, and try a proper rewrite of my real tool against go/packages (rather than a trivial adaptation as I did)

heschi commented 5 years ago

re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.

There's some amount of overhead involved in running go list, then marshalling and unmarshalling the results. In my experience that's usually small, say 50-100ms or so. That can be a problem for highly interactive tools like goimports and godef, but for a build tool like go2make I think it'd be okay.

After that the major issue is that packages.Load starts from scratch every time, so it's important to keep the number of calls low to avoid loading the same packages (e.g. the stdlib) over and over and over. That can be really easy, or require major refactorings. But for most people I don't think it's been a big problem.

thockin commented 5 years ago

ACK the last part. A simple adaptation of go2make would call it many many times. I will overhaul it to optimize for that.

On Wed, Mar 27, 2019, 4:27 PM Heschi Kreinick notifications@github.com wrote:

re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.

There's some amount of overhead involved in running go list, then marshalling and unmarshalling the results. In my experience that's usually small, say 50-100ms or so. That can be a problem for highly interactive tools like goimports and godef, but for a build tool like go2make I think it'd be okay.

After that the major issue is that packages.Load starts from scratch every time, so it's important to keep the number of calls low to avoid loading the same packages (e.g. the stdlib) over and over and over. That can be really easy, or require major refactorings. But for most people I don't think it's been a big problem.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/31087#issuecomment-477384842, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVIBwf1G1Y71UpgIgDHjizLF9qf-oks5va_5NgaJpZM4cOi1N .

josharian commented 5 years ago

Note that issues like https://github.com/golang/go/issues/30826 and https://github.com/golang/go/issues/30828 make it harder to avoid calling packages.Load multiple times. go-fuzz-build calls packages.Load three times, and properly dealing with the referenced issues (which I am currently hacking around in a way that will work until it doesn't) will require a fourth and possibly a fifth call.

gopherbot commented 5 years ago

Change https://golang.org/cl/170863 mentions this issue: cmd/gopherbot: CC triaged issues to owners

thockin commented 3 years ago

I'm baaaaack!

This bug started with me trying to convert from go/build to go/packages. Trivial conversion was not going to work, so I back-burnered it. But now even go/build is slow (not sure when it happened). I noticed that we still set GO111MODULE=off in some paths and thought I would nix that. Uh oh.

The below program runs very fast with GO111MODULE=off and very slow with it on. Same binary, same inputs, 100x slower.

$ time GO111MODULE=off /tmp/m/k/slow -d ./pkg/proxy/iptables/ | wc -l
3357

real    0m0.286s
user    0m0.166s
sys 0m0.112s

$ time GO111MODULE=on /tmp/m/k/slow -d ./pkg/proxy/iptables/ | wc -l
3357

real    0m37.591s
user    0m37.205s
sys 0m11.689s

From before:

re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.

It looks like there's an internal switch that happened at some point and without GO111MODULE=off, it uses the slow path. But HOLY COW is that slow.

The problem is that this tool runs on every Kubernetes build. In the legacy mode it takes 3-5 seconds for the whole repo, which is acceptable. Adding 5+ minutes to each run (even just a small incremental run) is obviously not viable.

So I guess the question I have is - why is this OK? Am I just so far outside the norm that 100x slowdown is acceptable in exchange for some (presumably) internal cleanup?

Can I rely on GO111MODULE=off to be fast forever? Or should I plan to totally overhaul how Kubernetes builds work?

Is there a better answer?

Program:

package main

import (
    "flag"
    "fmt"
    "go/build"
    "os"
    "path"
    "sort"
)

var flDebug = flag.Bool("d", false, "enable debugging output")

func main() {
    flag.Parse()

    debug("PWD", getwd())

    build.Default.BuildTags = []string{"ignore_autogenerated"}
    build.Default.UseAllFiles = false

    if len(flag.Args()) == 0 {
        os.Exit(1)
    }
    // Recurse.
    pkgName := flag.Args()[0]
    debug("starting", pkgName)
    if err := WalkPkg(pkgName, visitPkg); err != nil {
        fmt.Fprintln(os.Stderr, err)
        os.Exit(1)
    }
}

func debug(items ...interface{}) {
    if *flDebug {
        x := []interface{}{"DBG:"}
        x = append(x, items...)
        fmt.Println(x...)
    }
}

func visitPkg(importPath, absPath string) error {
    debug("visit", importPath)
    return saveImport(importPath)
}

// cache keeps track of which packages we have already loaded.
var cache = map[string]*build.Package{}

func saveImport(pkgName string) error {
    if cache[pkgName] != nil {
        return nil
    }
    pkg, err := loadPackage(pkgName)
    if err != nil {
        return err
    }
    debug("save", pkgName)
    cache[pkgName] = pkg

    debug("recurse", pkgName)
    defer func() { debug("done   ", pkgName) }()
    if !pkg.Goroot && (len(pkg.GoFiles)+len(pkg.Imports) > 0) {
        // Process deps of this package before the package itself.
        for _, impName := range pkg.Imports {
            if impName == "C" {
                continue
            }
            debug("depends on", impName)
            saveImport(impName)
        }
    }
    return nil
}

func loadPackage(pkgName string) (*build.Package, error) {
    debug("load", pkgName)
    pkg, err := build.Import(pkgName, getwd(), 0)
    if err != nil {
        // We can ignore NoGoError.  Anything else is real.
        if _, ok := err.(*build.NoGoError); !ok {
            return nil, err
        }
    }
    return pkg, nil
}

func getwd() string {
    pwd, err := os.Getwd()
    if err != nil {
        panic(fmt.Sprintf("can't get working directory: %v", err))
    }
    return pwd
}

// VisitFunc is a function called by WalkPkg to examine a single package.
type VisitFunc func(importPath string, absPath string) error

// WalkPkg recursively visits all packages under pkgName.  This is similar
// to filepath.Walk, except that it follows symlinks.  A package is always
// visited before the children of that package.
func WalkPkg(pkgName string, visit VisitFunc) error {
    // Visit the package itself.
    pkg, err := findPackage(pkgName)
    if err != nil {
        return err
    }
    if err := visit(pkg.ImportPath, pkg.Dir); err != nil {
        return err
    }

    // Read all of the child dirents and find sub-packages.
    infos, err := readDirInfos(pkg.Dir)
    if err != nil {
        return err
    }
    for _, info := range infos {
        if !info.IsDir() {
            continue
        }
        name := info.Name()
        if name[0] == '_' || (len(name) > 1 && name[0] == '.') || name == "testdata" {
            continue
        }
        // Don't use path.Join() because it drops leading `./` via path.Clean().
        err := WalkPkg(pkgName+"/"+name, visit)
        if err != nil {
            return err
        }
    }
    return nil
}

// findPackage finds a Go package.
func findPackage(pkgName string) (*build.Package, error) {
    debug("find", pkgName)
    pkg, err := build.Import(pkgName, getwd(), build.FindOnly)
    if err != nil {
        return nil, err
    }
    return pkg, nil
}

// readDirInfos returns a list of os.FileInfo structures for the dirents under
// dirPath.  The result list is sorted by name.  This is very similar to
// ioutil.ReadDir, except that it follows symlinks.
func readDirInfos(dirPath string) ([]os.FileInfo, error) {
    names, err := readDirNames(dirPath)
    if err != nil {
        return nil, err
    }
    sort.Strings(names)

    infos := make([]os.FileInfo, 0, len(names))
    for _, n := range names {
        info, err := os.Stat(path.Join(dirPath, n))
        if err != nil {
            return nil, err
        }
        infos = append(infos, info)
    }
    return infos, nil
}

// readDirNames returns a list of all dirents in dirPath.  The result list is
// not sorted or filtered.
func readDirNames(dirPath string) ([]string, error) {
    d, err := os.Open(dirPath)
    if err != nil {
        return nil, err
    }
    defer d.Close()

    names, err := d.Readdirnames(-1)
    if err != nil {
        return nil, err
    }
    return names, nil
}
mvdan commented 3 years ago

@thockin Just from a quick look, it seems like the go/build usage is like:

  1. walk a directory tree
  2. for every directory, find the package found in it
  3. "visit" the package, by loading it and its dependencies, with a memoizing cache

I don't quite understand what the program intends to accomplish, other than doing the loading and printing debug logs. I assume it's just for the sake of illustrating the slowness.

From a tooling point of view, the strategy above is really slow. To support modules, both go/packages and go/build execute the go list -json tool to load packages. So, at least, you are running go list twice per package. Even with the help of the module and build caches, it's not surprising the work takes tens of seconds; you'd be running go list close to a thousand times:

$ # how many packages does ./pkg/proxy/iptables depend on?
$ go list -deps ./pkg/proxy/iptables | wc -l
405

I think it would be more helpful if you explained what exactly your program is trying to do. I'm pretty convinced you could load all the packages you need in one single underlying go list call, similar to how I was able to use go list -deps above to list all the transitive dependencies in about 200ms.

mvdan commented 3 years ago

Also, I'm happy to take this somewhere else if you would like to discuss your program in more detail. GitHub is not a very good place to provide help of this kind :)

thockin commented 3 years ago

I don't quite understand what the program intends to accomplish, other than doing the loading and printing debug logs. I assume it's just for the sake of illustrating the slowness.

Precisely. The real program looks at all loaded packages and emits some Makefile snippets for them, but that did not seem relevant to demonstrating the problem.

From a tooling point of view, the strategy above is really slow. To support modules, both go/packages and go/build execute the go list -json tool to load packages. So, at least, you are running go list twice per package. Even with the help of the module and build caches, it's not surprising the work takes tens of seconds; you'd be running go list close to a thousand times

It was not slow when it was written. It's not slow with GO111MODULE=off. Modules totally borked it.

I think it would be more helpful if you explained what exactly your program is trying to do. I'm pretty convinced you could load all the packages you need in one single underlying go list call,

This is for Kubernetes. I have a really slow codegen tool that I want to run ONLY if the packages it consumes have ACTUALLY changed (mtime is a good enough proxy). To make it harder, I don't know a priori which packages need codegen. I want to scan the file tree, find the files that need codegen, check their deps, and only then run the codegen (actually we have several codegens in the same mold).

A single go list would be ideal except that it barfs on module edges and OF COURSE kubernetes has multiple modules. I modelled the k8s codebase at https://github.com/thockin/go-list-modules. What happens is:

$ DIRS_NEED_CODEGEN=$(\
    find . -type f -name \*.go \
        | xargs grep -l "^// *NEEDS_CODEGEN" \
        | xargs -n 1 dirname \
        | sort \
        | uniq \
        | paste -sd' '
)

$ go list -f '{{.GoFiles}}' $DIRS_NEED_CODEGEN
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other1/unused
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other1/used
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other2/unused
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other2/used
main module (example.com/m) does not contain package example.com/m/submod/unused
main module (example.com/m) does not contain package example.com/m/submod/used

I started writing a "smarter" tool to only call packages.Load() one time, but that won't load across modules ("main module does not contain", requires CWD to be set to the module dir), so I can't do more than one at a time.

I have a thread going on golang-nuts on this, and @jayconrod opened https://github.com/golang/go/issues/43733 as part of that. I admit that I am not 100% sure the fix there would fix me, but it seems like it would. There's terminology and philosophy around modules that I don't quite understand, so it's hard for me to say what is expected and what is a bug. What I know is that I keep running into dead-ends. Things I would expect to work, don't.

mvdan commented 3 years ago

Modules totally borked it.

When I say this is slow, I mean in the context of modules. Some tools that got away with being fast in the GOPATH world will need to be re-thought in the world of modules to remain fast, because modules are more powerful but also more complex. I know that can be frustrating, which is why I'm trying to help.

A single go list would be ideal except that it barfs on module edges and OF COURSE kubernetes has multiple modules.

Right, I was not aware the single repository contained many modules. It seems like you could use one go list call (aka one go/packages.Load) per module, using the right directory for each module.

I have a really slow codegen tool that I want to run ONLY if the packages it consumes have ACTUALLY changed (mtime is a good enough proxy) [...] I started writing a "smarter" tool to only call packages.Load() one time, but that won't load across modules

If you can link to that work-in-progress software, as well as the existing program that currently works, I can take a look and try to come up with a working version that's fast enough with modules. Your summary above is helpful, but ultimately I can't properly help until I see the code.

thockin commented 3 years ago

It seems like you could use one go list call (aka one go/packages.Load) per module, using the right directory for each module.

I tried that - it takes forever. There are a lot of cross-references and some very large modules end up being processed repeatedly. What used to take 2-4 seconds now takes 45. :(

If you can link to that work-in-progress software, as well as the existing program that currently works, I can take a look

I appreciate the effort!

Existing: https://github.com/kubernetes/kubernetes/tree/master/hack/make-rules/helpers/go2make

New one is very cobbled together right now as I was trying to find some way to make it produce correct results without resorting to GO111MODULE=off. Will paste a file here. Run as go2make2 -m <modules...> or go2make2 -p <packages...>. Set GO111MODULE=off to see it behave differently.

package main

import (
    "fmt"
    "os"
    "path"
    "path/filepath"
    "sort"
    "strings"
    "time"

    "github.com/davecgh/go-spew/spew"
    "github.com/spf13/pflag"
    "golang.org/x/tools/go/packages"
)

var pkgIdx = []string{} // sorted
var pkgMap = map[string]*packages.Package{}
var pkgDeps = map[string]map[string]bool{} // transitive
var pkgDirs = map[string]string{}

var flOut = pflag.StringP("out", "o", "struct", "What to output (struct | makevars | spew)")
var flMod = pflag.BoolP("mod", "m", false, "process the arguments as modules")
var flPkg = pflag.BoolP("pkg", "p", false, "process the arguments as packages")
var flDbg = pflag.BoolP("debug", "d", false, "enable debugging output")
var flDbgTime = pflag.BoolP("debug-time", "D", false, "enable debugging output with timestamps")

var lastDebugTime time.Time

func debug(items ...interface{}) {
    if *flDbg || *flDbgTime {
        x := []interface{}{}
        if *flDbgTime {
            elapsed := time.Since(lastDebugTime)
            if lastDebugTime.IsZero() {
                elapsed = 0
            }
            lastDebugTime = time.Now()
            x = append(x, fmt.Sprintf("DBG(+%v):", elapsed))
        } else {
            x = append(x, "DBG:")
        }
        x = append(x, items...)
        fmt.Fprintln(os.Stderr, x...)
    }

}

func main() {
    pflag.Parse()

    if !*flMod && !*flPkg {
        fmt.Fprintf(os.Stderr, "one of --mod (-m) or --pkg (-p) must be specified\n")
        os.Exit(1)
    }
    if *flMod && *flPkg {
        fmt.Fprintf(os.Stderr, "only one of --mod (-m) and --pkg (-p) may be specified\n")
        os.Exit(1)
    }

    targets := pflag.Args()
    if len(targets) == 0 {
        targets = append(targets, ".")
    }
    debug("targets:", targets)

    if *flMod {
        for _, mod := range targets {
            visitModule(mod)
        }
    } else {
        visitPackages(targets...)
    }

    switch *flOut {
    case "struct":
        dumpStructured()
    case "makevars":
        dumpMakeVars()
    case "spew":
        dumpSpew()
        //FIXME: default
    }
}

func visitModule(mod string) {
    debug("visiting module", mod)
    cfg := packages.Config{
        Mode:  packages.NeedName | packages.NeedFiles | packages.NeedImports | packages.NeedDeps | packages.NeedModule,
        Dir:   mod,
        Tests: false,
    }
    //log.Printf("mod-path: %v\n", mod)
    pkgs, err := packages.Load(&cfg, "./...")
    //log.Printf("loaded:   %v\n", mod)
    if err != nil {
        fmt.Fprintf(os.Stderr, "error loading packages in module %q: %v\n", mod, err)
        os.Exit(1)
    }

    for _, p := range pkgs {
        visitPackage(p) //FIXME: errors
    }
    //log.Printf("done:     %v\n", mod)
}

func visitPackages(pkgNames ...string) {
    debug("visiting packages", pkgNames)
    //FIXME: need to set GO111MODULE=off to load packages across modules
    cfg := packages.Config{
        Mode:  packages.NeedName | packages.NeedFiles | packages.NeedImports | packages.NeedDeps | packages.NeedModule,
        Tests: false,
    }
    //log.Printf("loading %d:\n", len(pkgNames))
    pkgs, err := packages.Load(&cfg, pkgNames...)
    //log.Printf("loaded  %d:\n", len(pkgNames))
    if err != nil {
        fmt.Fprintf(os.Stderr, "error loading packages: %v\n", err)
        os.Exit(1)
    }

    for _, p := range pkgs {
        visitPackage(p) //FIXME: errors
    }
    //log.Printf("done\n")
}

func visitPackage(pkg *packages.Package) map[string]bool {
    debug("visiting package", pkg.ID)
    if len(pkg.Errors) > 0 {
        debug("  package", pkg.ID, "has errors:", pkg.Errors)
        return nil
    }
    if pkgMap[pkg.ID] == pkg {
        debug("  package", pkg.ID, "already visited")
        return pkgDeps[pkg.ID]
    }
    if pkgMap[pkg.ID] != nil {
        debug("  package", pkg.ID, "already visited, but not the same")
        // HACK do we really need this?
        //log.Printf("revisit %q", pkg.ID)
        oldModDir := ""
        newModDir := ""
        if pkgMap[pkg.ID].Module != nil {
            oldModDir = pkgMap[pkg.ID].Module.Dir
        }
        if pkg.Module != nil {
            newModDir = pkg.Module.Dir
        }
        if len(newModDir) > len(oldModDir) {
            pkgMap[pkg.ID] = pkg
        }
        return pkgDeps[pkg.ID]
    }

    if len(pkg.GoFiles) == 0 {
        debug("  package", pkg.ID, "has no gofiles")
        return nil
    }
    // hack - don't descend into GOROOT, has to be a better way
    if strings.HasPrefix(pkg.GoFiles[0], "/usr") {
        debug("  package", pkg.ID, "is stdlib")
        return nil
    }
    pkgDir, _ := filepath.Abs(path.Dir(pkg.GoFiles[0]))
    if !strings.HasSuffix(pkgDir, "/") {
        pkgDir = pkgDir + "/"
    }
    if prune(pkgDir) {
        debug("  package", pkg.ID, "was pruned")
        return nil
    }
    debug("  package", pkg.ID, "is new")

    //n := time.Now()
    imps := map[string]bool{}
    for _, p := range pkg.Imports {
        pi := visitPackage(p)
        imps[p.ID] = true
        merge(imps, pi)
    }

    pkgIdx = append(pkgIdx, pkg.ID)
    pkgMap[pkg.ID] = pkg
    debug("  package", pkg.ID, "saved")
    pkgDeps[pkg.ID] = imps
    pkgDirs[pkg.ID] = pkgDir
    //fmt.Printf("%q took %v\n", pkg.ID, time.Since(n))
    return imps
}

func merge(lhs, rhs map[string]bool) {
    for k := range rhs {
        lhs[k] = true
    }
}

func flatten(m map[string]bool) []string {
    l := make([]string, 0, len(m))
    for k := range m {
        l = append(l, k)
    }
    sort.Strings(l)
    return l
}

func dumpStructured() {
    for _, k := range pkgIdx {
        pkg := pkgMap[k]
        fmt.Printf("Package %q {\n", pkg.ID)
        fmt.Printf("  Name: %q\n", pkg.Name)
        fmt.Printf("  PkgPath: %q\n", pkg.PkgPath)
        if pkg.Module != nil {
            fmt.Printf("  Module: %q\n", pkg.Module.Path)
            fmt.Printf("  ModuleDir: %q\n", pkg.Module.Dir)
        }
        fmt.Printf("  GoFiles: [\n")
        for _, f := range pkg.GoFiles {
            fmt.Printf("    %q\n", f)
        }
        fmt.Printf("  ]\n")
        fmt.Printf("  Imports: [\n")
        for _, i := range pkg.Imports {
            fmt.Printf("    %q\n", i.ID)
        }
        fmt.Printf("  ]\n")
        fmt.Printf("}\n")
    }
}

func dumpMakeVars() {
    for _, k := range pkgIdx {
        pkg := pkgMap[k]
        deps := flatten(pkgDeps[k])
        files := allFiles(pkg, deps)
        if len(files) == 0 {
            // TODO: we probably should output a dep on the dir itself, but
            // without files I am not sure how to find that.
            continue
        }
        //FIXME: this outputs "k8s.io/api/...apps/v1", but I need
        //"k8s.io/kubernetes/vendor/api"
        fmt.Printf("%s :=", pkg.ID)
        dir := ""
        for _, f := range files {
            d := path.Dir(f)
            if d != dir {
                fmt.Printf(" \\\n  %s/", d)
                dir = d
            }
            fmt.Printf(" \\\n  %s", f)
        }
        fmt.Printf("\n")
    }
}

func allFiles(pkg *packages.Package, deps []string) []string {
    out := make([]string, 0, len(pkg.GoFiles)+len(deps))
    out = append(out, pkg.GoFiles...)
    for _, dep := range deps {
        if pkgMap[dep] != nil {
            out = append(out, pkgMap[dep].GoFiles...)
        }
    }
    return out
}

func dumpSpew() {
    for _, k := range pkgIdx {
        pkg := pkgMap[k]
        cfg := spew.ConfigState{
            DisableMethods: true,
            Indent:         "    ",
            MaxDepth:       5,
        }
        cfg.Dump(pkg)
    }
}
mvdan commented 3 years ago

Thanks! While I look at that, I thought I'd point you towards Go's build cache. It already keeps track of "cache IDs" or keys for each package, which change if its source code or any transitive dependencies change. That might be a faster and possibly simpler approach to accomplish this goal, and it also mimics exactly what go build will see. For example:

$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/94/94ec43b481b9e23b8ef50d616abb35eed35f8aaf7d924f1c55db5820c60f336a-d
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/94/94ec43b481b9e23b8ef50d616abb35eed35f8aaf7d924f1c55db5820c60f336a-d
$ echo "package downwardapi" > ./pkg/volume/downwardapi/rebuild.go
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/64/6458d1a877ba9f7adbda4f9ba61ae5120c7b3911273d2cace5309eeb06d0bc54-d

The basic idea is: if that path/string changes, it's a different build. The command will build the package(s) in question, but that shoudl be fast if they are already in the build cache. Depends on whether you want to leave the heavy lifting to cmd/go or to make, I guess. I'd lean towards the former, just for consistency.

mvdan commented 3 years ago

I just realised you can obtain all the information you need from just the root module, because it includes all the other k8s.io modules in the vendor folder. I think that should result in a working program that takes a few seconds as well. I'll share some code later today.

mvdan commented 3 years ago

Alright, here you go: https://github.com/mvdan/kubernetes/commit/736c51ea5045473402d602271c689fc4b61facb2

See the commit message for details. Runs well for me in module mode, and even a bit faster than the old program in GOPATH mode.

If you wonder why use go list -json directly instead of go/packages - because the latter is build-system-agnostic, so it hides internal details that we need like the Goroot field.

thockin commented 3 years ago

First, thanks for the time!

go list -export looks like it does what I need, with a couple wrinkles.

1) A change to one of the "hub" packages (imported by many others) can take a long time to recalculate. Playing with it, it took upwards of 15 seconds once, but I can't reproduce it now. I have no viz into why.

2) Kubernetes gets a bunch of directory <dir> is not a package listed in vendor/modules.txt. This is happening because of the way we handle staging/src/... I can't use the staging dirs themselves because I get main module (k8s.io/kubernetes) does not contain package <dir>. I will have to see if I can fix that, but it's annoying.

I think this would be an ideal state if I can get there - I'd rather NOT reinvent parts of go. That said, it will require some retooling of the rest of the build, which assumes a bunch of directory-specific variables.

If that hits a dead-end, I'll circle back to your rebuild of go2make. I suspect, by reading it, that it does not quite satisfy - I need to also be able to list modules that the k8s "main" module does not depend on, which is where I get the need to list multiple packages and the main module ... does not contain package ... error.

mvdan commented 3 years ago

go list -export is slow because it needs to rebuild the packages it gets passed, since its result is pointing you at the compiled packages on disk. The only other similar bit of info you could possibly use is go list -f '{{.ImportPath}} {{.Stale}}', which tells you if a package is up to date in the build cache - but that's not really what you want, either.

Ideally there would be a flag similar to -export to obtain some sort of build cache key, but without actually having to fill the build cache. I don't think that exists today, since a package's cache key contains the resulting build of all dependencies as part of its hash. So, to compute a package's cache key, you must have built all dependencies first. You can read the first 100 or so lines in https://golang.org/src/cmd/go/internal/work/buildid.go, if you're curious :)

Maybe @bcmills @jayconrod @matloob have more ideas in terms of how you could piggyback off of Go's own build cache. Telling if a package (or its dependencies) has changed between two points in time seems generally useful, without having to fill the build cache first. Arguably that would be a fairly different caching system, but cmd/go could reuse a lot of code if it wanted to offer such a feature.

thockin commented 3 years ago

Yeah, I realized this is what is happening, which would maybe be OK if we could re-use the built results, but we have some build flags we pass, so I am not even sure we can.

The package listed in vendor/modules.txt error is a pain. It's coming because some of the packages that need codegen are testing-only and are only used by tests in other vendor/* packages (no deps from the "main" module"). go mod vendor specifically does not include test code, but kubernetes is too clever for our own good and defeats that.

If not for the main module ... does not contain package error, I could access those thru our staging/ directory, but I can't.

When I tried to run your new tool against my model repo (much easier to validate than all of kubernetes) I get:

2021/01/19 14:13:12 subdir/file.go:4:2: cannot find package "." in:
    /tmp/go-list-modules/vendor/example.com/m/submod/used
exit status 1

I will have to debug later. Out of time for now.

bcmills commented 3 years ago

Telling if a package (or its dependencies) has changed between two points in time seems generally useful, without having to fill the build cache first.

If you've actually built the package using the go command, you can use the Stale field of the go list output.

Otherwise, probably the best you can do with go list is to use the …Files fields and manually check them for diffs.

liggitt commented 3 years ago

Kubernetes has several tools (the go2make one discussed here, as well as code generation tools that make extensive use of go/build) that either run significantly slower (> order of magnitude) or get different results w.r.t. package import paths when running in module mode.

If go is planning to drop support for GOPATH in 1.17, have those behavior and performance changes been communicated?

jayconrod commented 3 years ago

Just to explain a bit of what's going on under the hood:

@liggitt To answer your questions directly:

If go is planning to drop support for GOPATH in 1.17, have those behavior and performance changes been communicated?

Yes, but probably not well enough. When we published packages a couple years ago, we worked directly with tool owners and folks in the tools working group. gopls has also replaced a lot of GOPATH-only tools.

If it would be helpful to add some advice to the go/build and golang.org/x/tools/go/packages documentation, I'd be happy to open an issue and do that for 1.17. Let me know if anything else would be useful in addition.

Is there a clear "if you're using this method in go/build, this is what you should do with golang.org/x/tools/go/packages instead" guide?

I'm not sure we have anything that specific. @matloob do you know if there's anything like that?

What are the compatibility guarantees around the go/build package behavior and performance?

go/build is covered by the Go 1 compatibility guidelines. It should continue to work in module-aware mode, but we don't have any specific performance guarantees.

bcmills commented 3 years ago
  • go list always loads dependencies. It probably doesn't need to do that in GOPATH mode, but in module mode, if any transitively imported packages are missing, we might need to change the build list, which could change selected versions of other dependencies.

FWIW, I don't think that go list should implicitly load dependencies in module mode. (That's arguably part of #29666.) But that's going to be a lot of work to fix, I think, including cleaning up a lot of tests that assume it today.