Closed thockin closed 2 years ago
This was discussed offline: The slowness was due to the Google-specific gopackagesdriver binary.
Unfortunately go/packages will always be slower than go/build because we need to execute go list as the true authority of package information, but this kind of slowness is unexpected and really bad. We'll get a google-specific fix in soon, but for those of you facing this problem, you can set GOPACKAGESDRIVER=off in the short term to bypass that driver.
Thanks. I am still concerned at "go/packages will always be slower than go/build" even when we have vendored EVERYTHING. I will hang tight for the followup here, and try a proper rewrite of my real tool against go/packages (rather than a trivial adaptation as I did)
re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.
There's some amount of overhead involved in running go list
, then marshalling and unmarshalling the results. In my experience that's usually small, say 50-100ms or so. That can be a problem for highly interactive tools like goimports
and godef
, but for a build tool like go2make
I think it'd be okay.
After that the major issue is that packages.Load
starts from scratch every time, so it's important to keep the number of calls low to avoid loading the same packages (e.g. the stdlib) over and over and over. That can be really easy, or require major refactorings. But for most people I don't think it's been a big problem.
ACK the last part. A simple adaptation of go2make would call it many many times. I will overhaul it to optimize for that.
On Wed, Mar 27, 2019, 4:27 PM Heschi Kreinick notifications@github.com wrote:
re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.
There's some amount of overhead involved in running go list, then marshalling and unmarshalling the results. In my experience that's usually small, say 50-100ms or so. That can be a problem for highly interactive tools like goimports and godef, but for a build tool like go2make I think it'd be okay.
After that the major issue is that packages.Load starts from scratch every time, so it's important to keep the number of calls low to avoid loading the same packages (e.g. the stdlib) over and over and over. That can be really easy, or require major refactorings. But for most people I don't think it's been a big problem.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/31087#issuecomment-477384842, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVIBwf1G1Y71UpgIgDHjizLF9qf-oks5va_5NgaJpZM4cOi1N .
Note that issues like https://github.com/golang/go/issues/30826 and https://github.com/golang/go/issues/30828 make it harder to avoid calling packages.Load
multiple times. go-fuzz-build
calls packages.Load
three times, and properly dealing with the referenced issues (which I am currently hacking around in a way that will work until it doesn't) will require a fourth and possibly a fifth call.
Change https://golang.org/cl/170863 mentions this issue: cmd/gopherbot: CC triaged issues to owners
I'm baaaaack!
This bug started with me trying to convert from go/build to go/packages. Trivial conversion was not going to work, so I back-burnered it. But now even go/build is slow (not sure when it happened). I noticed that we still set GO111MODULE=off
in some paths and thought I would nix that. Uh oh.
The below program runs very fast with GO111MODULE=off and very slow with it on. Same binary, same inputs, 100x slower.
$ time GO111MODULE=off /tmp/m/k/slow -d ./pkg/proxy/iptables/ | wc -l
3357
real 0m0.286s
user 0m0.166s
sys 0m0.112s
$ time GO111MODULE=on /tmp/m/k/slow -d ./pkg/proxy/iptables/ | wc -l
3357
real 0m37.591s
user 0m37.205s
sys 0m11.689s
From before:
re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.
It looks like there's an internal switch that happened at some point and without GO111MODULE=off
, it uses the slow path. But HOLY COW is that slow.
The problem is that this tool runs on every Kubernetes build. In the legacy mode it takes 3-5 seconds for the whole repo, which is acceptable. Adding 5+ minutes to each run (even just a small incremental run) is obviously not viable.
So I guess the question I have is - why is this OK? Am I just so far outside the norm that 100x slowdown is acceptable in exchange for some (presumably) internal cleanup?
Can I rely on GO111MODULE=off
to be fast forever? Or should I plan to totally overhaul how Kubernetes builds work?
Is there a better answer?
Program:
package main
import (
"flag"
"fmt"
"go/build"
"os"
"path"
"sort"
)
var flDebug = flag.Bool("d", false, "enable debugging output")
func main() {
flag.Parse()
debug("PWD", getwd())
build.Default.BuildTags = []string{"ignore_autogenerated"}
build.Default.UseAllFiles = false
if len(flag.Args()) == 0 {
os.Exit(1)
}
// Recurse.
pkgName := flag.Args()[0]
debug("starting", pkgName)
if err := WalkPkg(pkgName, visitPkg); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
}
func debug(items ...interface{}) {
if *flDebug {
x := []interface{}{"DBG:"}
x = append(x, items...)
fmt.Println(x...)
}
}
func visitPkg(importPath, absPath string) error {
debug("visit", importPath)
return saveImport(importPath)
}
// cache keeps track of which packages we have already loaded.
var cache = map[string]*build.Package{}
func saveImport(pkgName string) error {
if cache[pkgName] != nil {
return nil
}
pkg, err := loadPackage(pkgName)
if err != nil {
return err
}
debug("save", pkgName)
cache[pkgName] = pkg
debug("recurse", pkgName)
defer func() { debug("done ", pkgName) }()
if !pkg.Goroot && (len(pkg.GoFiles)+len(pkg.Imports) > 0) {
// Process deps of this package before the package itself.
for _, impName := range pkg.Imports {
if impName == "C" {
continue
}
debug("depends on", impName)
saveImport(impName)
}
}
return nil
}
func loadPackage(pkgName string) (*build.Package, error) {
debug("load", pkgName)
pkg, err := build.Import(pkgName, getwd(), 0)
if err != nil {
// We can ignore NoGoError. Anything else is real.
if _, ok := err.(*build.NoGoError); !ok {
return nil, err
}
}
return pkg, nil
}
func getwd() string {
pwd, err := os.Getwd()
if err != nil {
panic(fmt.Sprintf("can't get working directory: %v", err))
}
return pwd
}
// VisitFunc is a function called by WalkPkg to examine a single package.
type VisitFunc func(importPath string, absPath string) error
// WalkPkg recursively visits all packages under pkgName. This is similar
// to filepath.Walk, except that it follows symlinks. A package is always
// visited before the children of that package.
func WalkPkg(pkgName string, visit VisitFunc) error {
// Visit the package itself.
pkg, err := findPackage(pkgName)
if err != nil {
return err
}
if err := visit(pkg.ImportPath, pkg.Dir); err != nil {
return err
}
// Read all of the child dirents and find sub-packages.
infos, err := readDirInfos(pkg.Dir)
if err != nil {
return err
}
for _, info := range infos {
if !info.IsDir() {
continue
}
name := info.Name()
if name[0] == '_' || (len(name) > 1 && name[0] == '.') || name == "testdata" {
continue
}
// Don't use path.Join() because it drops leading `./` via path.Clean().
err := WalkPkg(pkgName+"/"+name, visit)
if err != nil {
return err
}
}
return nil
}
// findPackage finds a Go package.
func findPackage(pkgName string) (*build.Package, error) {
debug("find", pkgName)
pkg, err := build.Import(pkgName, getwd(), build.FindOnly)
if err != nil {
return nil, err
}
return pkg, nil
}
// readDirInfos returns a list of os.FileInfo structures for the dirents under
// dirPath. The result list is sorted by name. This is very similar to
// ioutil.ReadDir, except that it follows symlinks.
func readDirInfos(dirPath string) ([]os.FileInfo, error) {
names, err := readDirNames(dirPath)
if err != nil {
return nil, err
}
sort.Strings(names)
infos := make([]os.FileInfo, 0, len(names))
for _, n := range names {
info, err := os.Stat(path.Join(dirPath, n))
if err != nil {
return nil, err
}
infos = append(infos, info)
}
return infos, nil
}
// readDirNames returns a list of all dirents in dirPath. The result list is
// not sorted or filtered.
func readDirNames(dirPath string) ([]string, error) {
d, err := os.Open(dirPath)
if err != nil {
return nil, err
}
defer d.Close()
names, err := d.Readdirnames(-1)
if err != nil {
return nil, err
}
return names, nil
}
@thockin Just from a quick look, it seems like the go/build
usage is like:
I don't quite understand what the program intends to accomplish, other than doing the loading and printing debug logs. I assume it's just for the sake of illustrating the slowness.
From a tooling point of view, the strategy above is really slow. To support modules, both go/packages
and go/build
execute the go list -json
tool to load packages. So, at least, you are running go list
twice per package. Even with the help of the module and build caches, it's not surprising the work takes tens of seconds; you'd be running go list
close to a thousand times:
$ # how many packages does ./pkg/proxy/iptables depend on?
$ go list -deps ./pkg/proxy/iptables | wc -l
405
I think it would be more helpful if you explained what exactly your program is trying to do. I'm pretty convinced you could load all the packages you need in one single underlying go list
call, similar to how I was able to use go list -deps
above to list all the transitive dependencies in about 200ms.
Also, I'm happy to take this somewhere else if you would like to discuss your program in more detail. GitHub is not a very good place to provide help of this kind :)
I don't quite understand what the program intends to accomplish, other than doing the loading and printing debug logs. I assume it's just for the sake of illustrating the slowness.
Precisely. The real program looks at all loaded packages and emits some Makefile snippets for them, but that did not seem relevant to demonstrating the problem.
From a tooling point of view, the strategy above is really slow. To support modules, both go/packages and go/build execute the go list -json tool to load packages. So, at least, you are running go list twice per package. Even with the help of the module and build caches, it's not surprising the work takes tens of seconds; you'd be running go list close to a thousand times
It was not slow when it was written. It's not slow with GO111MODULE=off. Modules totally borked it.
I think it would be more helpful if you explained what exactly your program is trying to do. I'm pretty convinced you could load all the packages you need in one single underlying go list call,
This is for Kubernetes. I have a really slow codegen tool that I want to run ONLY if the packages it consumes have ACTUALLY changed (mtime is a good enough proxy). To make it harder, I don't know a priori which packages need codegen. I want to scan the file tree, find the files that need codegen, check their deps, and only then run the codegen (actually we have several codegens in the same mold).
A single go list
would be ideal except that it barfs on module edges and OF COURSE kubernetes has multiple modules. I modelled the k8s codebase at https://github.com/thockin/go-list-modules. What happens is:
$ DIRS_NEED_CODEGEN=$(\
find . -type f -name \*.go \
| xargs grep -l "^// *NEEDS_CODEGEN" \
| xargs -n 1 dirname \
| sort \
| uniq \
| paste -sd' '
)
$ go list -f '{{.GoFiles}}' $DIRS_NEED_CODEGEN
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other1/unused
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other1/used
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other2/unused
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other2/used
main module (example.com/m) does not contain package example.com/m/submod/unused
main module (example.com/m) does not contain package example.com/m/submod/used
I started writing a "smarter" tool to only call packages.Load()
one time, but that won't load across modules ("main module does not contain", requires CWD to be set to the module dir), so I can't do more than one at a time.
I have a thread going on golang-nuts on this, and @jayconrod opened https://github.com/golang/go/issues/43733 as part of that. I admit that I am not 100% sure the fix there would fix me, but it seems like it would. There's terminology and philosophy around modules that I don't quite understand, so it's hard for me to say what is expected and what is a bug. What I know is that I keep running into dead-ends. Things I would expect to work, don't.
Modules totally borked it.
When I say this is slow, I mean in the context of modules. Some tools that got away with being fast in the GOPATH world will need to be re-thought in the world of modules to remain fast, because modules are more powerful but also more complex. I know that can be frustrating, which is why I'm trying to help.
A single
go list
would be ideal except that it barfs on module edges and OF COURSE kubernetes has multiple modules.
Right, I was not aware the single repository contained many modules. It seems like you could use one go list
call (aka one go/packages.Load
) per module, using the right directory for each module.
I have a really slow codegen tool that I want to run ONLY if the packages it consumes have ACTUALLY changed (mtime is a good enough proxy) [...] I started writing a "smarter" tool to only call
packages.Load()
one time, but that won't load across modules
If you can link to that work-in-progress software, as well as the existing program that currently works, I can take a look and try to come up with a working version that's fast enough with modules. Your summary above is helpful, but ultimately I can't properly help until I see the code.
It seems like you could use one go list call (aka one go/packages.Load) per module, using the right directory for each module.
I tried that - it takes forever. There are a lot of cross-references and some very large modules end up being processed repeatedly. What used to take 2-4 seconds now takes 45. :(
If you can link to that work-in-progress software, as well as the existing program that currently works, I can take a look
I appreciate the effort!
Existing: https://github.com/kubernetes/kubernetes/tree/master/hack/make-rules/helpers/go2make
New one is very cobbled together right now as I was trying to find some way to make it produce correct results without resorting to GO111MODULE=off. Will paste a file here. Run as go2make2 -m <modules...>
or go2make2 -p <packages...>
. Set GO111MODULE=off to see it behave differently.
package main
import (
"fmt"
"os"
"path"
"path/filepath"
"sort"
"strings"
"time"
"github.com/davecgh/go-spew/spew"
"github.com/spf13/pflag"
"golang.org/x/tools/go/packages"
)
var pkgIdx = []string{} // sorted
var pkgMap = map[string]*packages.Package{}
var pkgDeps = map[string]map[string]bool{} // transitive
var pkgDirs = map[string]string{}
var flOut = pflag.StringP("out", "o", "struct", "What to output (struct | makevars | spew)")
var flMod = pflag.BoolP("mod", "m", false, "process the arguments as modules")
var flPkg = pflag.BoolP("pkg", "p", false, "process the arguments as packages")
var flDbg = pflag.BoolP("debug", "d", false, "enable debugging output")
var flDbgTime = pflag.BoolP("debug-time", "D", false, "enable debugging output with timestamps")
var lastDebugTime time.Time
func debug(items ...interface{}) {
if *flDbg || *flDbgTime {
x := []interface{}{}
if *flDbgTime {
elapsed := time.Since(lastDebugTime)
if lastDebugTime.IsZero() {
elapsed = 0
}
lastDebugTime = time.Now()
x = append(x, fmt.Sprintf("DBG(+%v):", elapsed))
} else {
x = append(x, "DBG:")
}
x = append(x, items...)
fmt.Fprintln(os.Stderr, x...)
}
}
func main() {
pflag.Parse()
if !*flMod && !*flPkg {
fmt.Fprintf(os.Stderr, "one of --mod (-m) or --pkg (-p) must be specified\n")
os.Exit(1)
}
if *flMod && *flPkg {
fmt.Fprintf(os.Stderr, "only one of --mod (-m) and --pkg (-p) may be specified\n")
os.Exit(1)
}
targets := pflag.Args()
if len(targets) == 0 {
targets = append(targets, ".")
}
debug("targets:", targets)
if *flMod {
for _, mod := range targets {
visitModule(mod)
}
} else {
visitPackages(targets...)
}
switch *flOut {
case "struct":
dumpStructured()
case "makevars":
dumpMakeVars()
case "spew":
dumpSpew()
//FIXME: default
}
}
func visitModule(mod string) {
debug("visiting module", mod)
cfg := packages.Config{
Mode: packages.NeedName | packages.NeedFiles | packages.NeedImports | packages.NeedDeps | packages.NeedModule,
Dir: mod,
Tests: false,
}
//log.Printf("mod-path: %v\n", mod)
pkgs, err := packages.Load(&cfg, "./...")
//log.Printf("loaded: %v\n", mod)
if err != nil {
fmt.Fprintf(os.Stderr, "error loading packages in module %q: %v\n", mod, err)
os.Exit(1)
}
for _, p := range pkgs {
visitPackage(p) //FIXME: errors
}
//log.Printf("done: %v\n", mod)
}
func visitPackages(pkgNames ...string) {
debug("visiting packages", pkgNames)
//FIXME: need to set GO111MODULE=off to load packages across modules
cfg := packages.Config{
Mode: packages.NeedName | packages.NeedFiles | packages.NeedImports | packages.NeedDeps | packages.NeedModule,
Tests: false,
}
//log.Printf("loading %d:\n", len(pkgNames))
pkgs, err := packages.Load(&cfg, pkgNames...)
//log.Printf("loaded %d:\n", len(pkgNames))
if err != nil {
fmt.Fprintf(os.Stderr, "error loading packages: %v\n", err)
os.Exit(1)
}
for _, p := range pkgs {
visitPackage(p) //FIXME: errors
}
//log.Printf("done\n")
}
func visitPackage(pkg *packages.Package) map[string]bool {
debug("visiting package", pkg.ID)
if len(pkg.Errors) > 0 {
debug(" package", pkg.ID, "has errors:", pkg.Errors)
return nil
}
if pkgMap[pkg.ID] == pkg {
debug(" package", pkg.ID, "already visited")
return pkgDeps[pkg.ID]
}
if pkgMap[pkg.ID] != nil {
debug(" package", pkg.ID, "already visited, but not the same")
// HACK do we really need this?
//log.Printf("revisit %q", pkg.ID)
oldModDir := ""
newModDir := ""
if pkgMap[pkg.ID].Module != nil {
oldModDir = pkgMap[pkg.ID].Module.Dir
}
if pkg.Module != nil {
newModDir = pkg.Module.Dir
}
if len(newModDir) > len(oldModDir) {
pkgMap[pkg.ID] = pkg
}
return pkgDeps[pkg.ID]
}
if len(pkg.GoFiles) == 0 {
debug(" package", pkg.ID, "has no gofiles")
return nil
}
// hack - don't descend into GOROOT, has to be a better way
if strings.HasPrefix(pkg.GoFiles[0], "/usr") {
debug(" package", pkg.ID, "is stdlib")
return nil
}
pkgDir, _ := filepath.Abs(path.Dir(pkg.GoFiles[0]))
if !strings.HasSuffix(pkgDir, "/") {
pkgDir = pkgDir + "/"
}
if prune(pkgDir) {
debug(" package", pkg.ID, "was pruned")
return nil
}
debug(" package", pkg.ID, "is new")
//n := time.Now()
imps := map[string]bool{}
for _, p := range pkg.Imports {
pi := visitPackage(p)
imps[p.ID] = true
merge(imps, pi)
}
pkgIdx = append(pkgIdx, pkg.ID)
pkgMap[pkg.ID] = pkg
debug(" package", pkg.ID, "saved")
pkgDeps[pkg.ID] = imps
pkgDirs[pkg.ID] = pkgDir
//fmt.Printf("%q took %v\n", pkg.ID, time.Since(n))
return imps
}
func merge(lhs, rhs map[string]bool) {
for k := range rhs {
lhs[k] = true
}
}
func flatten(m map[string]bool) []string {
l := make([]string, 0, len(m))
for k := range m {
l = append(l, k)
}
sort.Strings(l)
return l
}
func dumpStructured() {
for _, k := range pkgIdx {
pkg := pkgMap[k]
fmt.Printf("Package %q {\n", pkg.ID)
fmt.Printf(" Name: %q\n", pkg.Name)
fmt.Printf(" PkgPath: %q\n", pkg.PkgPath)
if pkg.Module != nil {
fmt.Printf(" Module: %q\n", pkg.Module.Path)
fmt.Printf(" ModuleDir: %q\n", pkg.Module.Dir)
}
fmt.Printf(" GoFiles: [\n")
for _, f := range pkg.GoFiles {
fmt.Printf(" %q\n", f)
}
fmt.Printf(" ]\n")
fmt.Printf(" Imports: [\n")
for _, i := range pkg.Imports {
fmt.Printf(" %q\n", i.ID)
}
fmt.Printf(" ]\n")
fmt.Printf("}\n")
}
}
func dumpMakeVars() {
for _, k := range pkgIdx {
pkg := pkgMap[k]
deps := flatten(pkgDeps[k])
files := allFiles(pkg, deps)
if len(files) == 0 {
// TODO: we probably should output a dep on the dir itself, but
// without files I am not sure how to find that.
continue
}
//FIXME: this outputs "k8s.io/api/...apps/v1", but I need
//"k8s.io/kubernetes/vendor/api"
fmt.Printf("%s :=", pkg.ID)
dir := ""
for _, f := range files {
d := path.Dir(f)
if d != dir {
fmt.Printf(" \\\n %s/", d)
dir = d
}
fmt.Printf(" \\\n %s", f)
}
fmt.Printf("\n")
}
}
func allFiles(pkg *packages.Package, deps []string) []string {
out := make([]string, 0, len(pkg.GoFiles)+len(deps))
out = append(out, pkg.GoFiles...)
for _, dep := range deps {
if pkgMap[dep] != nil {
out = append(out, pkgMap[dep].GoFiles...)
}
}
return out
}
func dumpSpew() {
for _, k := range pkgIdx {
pkg := pkgMap[k]
cfg := spew.ConfigState{
DisableMethods: true,
Indent: " ",
MaxDepth: 5,
}
cfg.Dump(pkg)
}
}
Thanks! While I look at that, I thought I'd point you towards Go's build cache. It already keeps track of "cache IDs" or keys for each package, which change if its source code or any transitive dependencies change. That might be a faster and possibly simpler approach to accomplish this goal, and it also mimics exactly what go build
will see. For example:
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/94/94ec43b481b9e23b8ef50d616abb35eed35f8aaf7d924f1c55db5820c60f336a-d
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/94/94ec43b481b9e23b8ef50d616abb35eed35f8aaf7d924f1c55db5820c60f336a-d
$ echo "package downwardapi" > ./pkg/volume/downwardapi/rebuild.go
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/64/6458d1a877ba9f7adbda4f9ba61ae5120c7b3911273d2cace5309eeb06d0bc54-d
The basic idea is: if that path/string changes, it's a different build. The command will build the package(s) in question, but that shoudl be fast if they are already in the build cache. Depends on whether you want to leave the heavy lifting to cmd/go
or to make
, I guess. I'd lean towards the former, just for consistency.
I just realised you can obtain all the information you need from just the root module, because it includes all the other k8s.io modules in the vendor folder. I think that should result in a working program that takes a few seconds as well. I'll share some code later today.
Alright, here you go: https://github.com/mvdan/kubernetes/commit/736c51ea5045473402d602271c689fc4b61facb2
See the commit message for details. Runs well for me in module mode, and even a bit faster than the old program in GOPATH mode.
If you wonder why use go list -json
directly instead of go/packages
- because the latter is build-system-agnostic, so it hides internal details that we need like the Goroot
field.
First, thanks for the time!
go list -export
looks like it does what I need, with a couple wrinkles.
1) A change to one of the "hub" packages (imported by many others) can take a long time to recalculate. Playing with it, it took upwards of 15 seconds once, but I can't reproduce it now. I have no viz into why.
2) Kubernetes gets a bunch of directory <dir> is not a package listed in vendor/modules.txt
. This is happening because of the way we handle staging/src/... I can't use the staging dirs themselves because I get main module (k8s.io/kubernetes) does not contain package <dir>
. I will have to see if I can fix that, but it's annoying.
I think this would be an ideal state if I can get there - I'd rather NOT reinvent parts of go. That said, it will require some retooling of the rest of the build, which assumes a bunch of directory-specific variables.
If that hits a dead-end, I'll circle back to your rebuild of go2make. I suspect, by reading it, that it does not quite satisfy - I need to also be able to list modules that the k8s "main" module does not depend on, which is where I get the need to list multiple packages and the main module ... does not contain package ...
error.
go list -export
is slow because it needs to rebuild the packages it gets passed, since its result is pointing you at the compiled packages on disk. The only other similar bit of info you could possibly use is go list -f '{{.ImportPath}} {{.Stale}}'
, which tells you if a package is up to date in the build cache - but that's not really what you want, either.
Ideally there would be a flag similar to -export
to obtain some sort of build cache key, but without actually having to fill the build cache. I don't think that exists today, since a package's cache key contains the resulting build of all dependencies as part of its hash. So, to compute a package's cache key, you must have built all dependencies first. You can read the first 100 or so lines in https://golang.org/src/cmd/go/internal/work/buildid.go, if you're curious :)
Maybe @bcmills @jayconrod @matloob have more ideas in terms of how you could piggyback off of Go's own build cache. Telling if a package (or its dependencies) has changed between two points in time seems generally useful, without having to fill the build cache first. Arguably that would be a fairly different caching system, but cmd/go could reuse a lot of code if it wanted to offer such a feature.
Yeah, I realized this is what is happening, which would maybe be OK if we could re-use the built results, but we have some build flags we pass, so I am not even sure we can.
The package listed in vendor/modules.txt
error is a pain. It's coming because some of the packages that need codegen are testing-only and are only used by tests in other vendor/* packages (no deps from the "main" module"). go mod vendor
specifically does not include test code, but kubernetes is too clever for our own good and defeats that.
If not for the main module ... does not contain package
error, I could access those thru our staging/ directory, but I can't.
When I tried to run your new tool against my model repo (much easier to validate than all of kubernetes) I get:
2021/01/19 14:13:12 subdir/file.go:4:2: cannot find package "." in:
/tmp/go-list-modules/vendor/example.com/m/submod/used
exit status 1
I will have to debug later. Out of time for now.
Telling if a package (or its dependencies) has changed between two points in time seems generally useful, without having to fill the build cache first.
If you've actually built the package using the go
command, you can use the Stale
field of the go list
output.
Otherwise, probably the best you can do with go list
is to use the …Files
fields and manually check them for diffs.
Kubernetes has several tools (the go2make one discussed here, as well as code generation tools that make extensive use of go/build
) that either run significantly slower (> order of magnitude) or get different results w.r.t. package import paths when running in module mode.
If go is planning to drop support for GOPATH in 1.17, have those behavior and performance changes been communicated?
go/build
package behavior and performance?Just to explain a bit of what's going on under the hood:
go/build.ImportDir
loads information from a given directory without loading dependencies. It doesn't do anything different in module mode. Using this should stay fast.go/build.Import
resolves an import path to a directory by searching GOPATH and vendor. That's usually fast. Again, it doesn't load dependencies.go/build.Import
uses go list
to resolve an import path to a directory.go list
always loads dependencies. It probably doesn't need to do that in GOPATH mode, but in module mode, if any transitively imported packages are missing, we might need to change the build list, which could change selected versions of other dependencies.go/build.Import
for n packages, switch from GO111MODULE=off
to GO111MODULE=on
is like switching from O(n) to O(n2) (using that notation very loosely) because the same shared set of dependencies is loaded many times.golang.org/x/tools/go/packages
was written. Most tools before modules were based on go/build
, but it was clear that wasn't going to be powerful enough in the long term. We recommend that golang.org/x/tools/go/packages
is used in most cases.build.Import
, packages.Load
is based on go list
, but unlike build.Import
, packages.Load
can load a batch of packages at once. So loading n packages should take O(n) time if it's done with one packages.Load
call (again, very loosely).@liggitt To answer your questions directly:
If go is planning to drop support for GOPATH in 1.17, have those behavior and performance changes been communicated?
Yes, but probably not well enough. When we published packages
a couple years ago, we worked directly with tool owners and folks in the tools working group. gopls has also replaced a lot of GOPATH-only tools.
If it would be helpful to add some advice to the go/build
and golang.org/x/tools/go/packages
documentation, I'd be happy to open an issue and do that for 1.17. Let me know if anything else would be useful in addition.
Is there a clear "if you're using this method in go/build, this is what you should do with golang.org/x/tools/go/packages instead" guide?
I'm not sure we have anything that specific. @matloob do you know if there's anything like that?
What are the compatibility guarantees around the go/build package behavior and performance?
go/build
is covered by the Go 1 compatibility guidelines. It should continue to work in module-aware mode, but we don't have any specific performance guarantees.
go list
always loads dependencies. It probably doesn't need to do that in GOPATH mode, but in module mode, if any transitively imported packages are missing, we might need to change the build list, which could change selected versions of other dependencies.
FWIW, I don't think that go list
should implicitly load dependencies in module mode. (That's arguably part of #29666.) But that's going to be a lot of work to fix, I think, including cleaning up a lot of tests that assume it today.
As per @heschik in #29452
Converting a tool from
go/build
togo/packages
incurs a huge slowdown. I tried to min-repro it here. Note that we have a full vendor/ dir, no un-vendored deps, and modules are NOT turned on yet.What version of Go are you using (
go version
)?1.12.1
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?linux, amd64
go env
OutputWhat did you do?
Build and run the two programs below and time them.
What did you expect to see?
Roughly the same timing
What did you see instead?
0.008s vs 1.272s
go-build.go:
go-packages.go:
Testing: