I took a stab at switching over to go/packages so that we can support go modules. See the link for more detailed information about go/packages, but below is a summary of the big differences in how go/packages and their impact on the implementation changes to gta in this PR:
go/packages uses go list under the hood as the default driver (Google has their own driver that they use presumably to handle monorepos better)
It takes a config + a set of patterns and runs go list using them, parses the results and returns a slice of package structs (it also does additional analysis/parsing to hydrate the struct if configured) representing the results
Each call to Load results in go list being run, which means we need to load all packages up front
This differs very much from the existing implementation of gta which relies heavily on calling go/build's Import on demand.
This is also the reason the implementation in the PR builds an entire dependency graph on initialization
When loading tests is enabled:
Each package path can have 2 packages:
the package itself
the package itself when built for tests
and 2 related packages with their own package paths:
the *_test package
the .test binary package
High Level Overview of Implementation
Using the differ, determine the set of changed files
Load all the packages matching the provided -include flag
Construct dependency graph using the loaded packages
the dependency graph holds package nodes that each have a set of dependencies and dependents
a few auxiliary maps are built for lookups
keep track of package ID -> package mapping
keep track of package path -> packages mapping
keep track of file -> package mapping
keep track of best effort directory -> package mapping
Use the dependency graph to determine the set of transitive dependents
Todo and Known Limitations
[ ] Fix gta tests
GTA no longer has much of the logic from before, instead it's the DependencyGraph that actually should be the target of the existing tests
[ ] Fix gtaintegration tests
[ ] All the packages that involve removals fail
[ ] Figure out how to handle package deletions
[ ] There is a lot of slice iteration and duplicated work that does not need to happen, I've not attempted to improve this for the sake of clarity in this PR, but something to follow up on
[ ] Does go list hit the internet even when all packages are vendored? 🤔
[ ] One of the test cases took forever, and I need to figure out why
Verification
Because I haven't yet reworked all the existing tests to pass, I've been relying on comparing results between the existing implementation of gta and the new implementation, using cthluhu merge commits test cases. This is definitely a crutch, but it was a good way to validate progress.
The first I think is hitting an edge case that the existing gta implementation doesn't quite handle correctly. The following is a simplified example case:
-- foo/foo.go
package foo
const Foo = "foo"
-- foo/foo_test.go
package foo_test
-- bar/bar.go
package bar
import "foo"
const import_foo = foo.Foo
If foo/foo_test.go changed what should the dirty packages be?
The existing gta implementation says both foo and bar are dirty. The new implementation says only foo is dirty. A file change that maps to the foo_test package should only ever affect the foo package. @bhcleek pointed out to me that this is an edge case that gta currently does not handle well.
I haven't had a chance to dig into this yet, but the existing implementation reported dirty vendor packages. I'm not sure I fully understand the reason for this.
I took a stab at switching over to go/packages so that we can support go modules. See the link for more detailed information about go/packages, but below is a summary of the big differences in how go/packages and their impact on the implementation changes to gta in this PR:
go list
under the hood as the default driver (Google has their own driver that they use presumably to handle monorepos better)go list
using them, parses the results and returns a slice of package structs (it also does additional analysis/parsing to hydrate the struct if configured) representing the resultsgo list
is slow, as a result go/packages is MUCH slower than go/build for now https://github.com/golang/go/issues/31087Load
results ingo list
being run, which means we need to load all packages up frontImport
on demand.*_test
package.test
binary packageHigh Level Overview of Implementation
-include
flagTodo and Known Limitations
GTA
no longer has much of the logic from before, instead it's theDependencyGraph
that actually should be the target of the existing testsgo list
hit the internet even when all packages are vendored? 🤔Verification
Because I haven't yet reworked all the existing tests to pass, I've been relying on comparing results between the existing implementation of gta and the new implementation, using cthluhu merge commits test cases. This is definitely a crutch, but it was a good way to validate progress.
https://github.internal.digitalocean.com/gist/nanzhong/a32eba444a7bef260a5193efa26cc4a3 contains the test script as well some initial results of the latest ~70 merge commits.
Differing Cases
Out of the ~70 test cases, 2 have differing results.
Handling
*_test
PackagesCthulhu merge commit 97a167826ac22d64476b130cbf215a3e418a382d.
The first I think is hitting an edge case that the existing gta implementation doesn't quite handle correctly. The following is a simplified example case: -- foo/foo.go
-- foo/foo_test.go
-- bar/bar.go
If foo/foo_test.go changed what should the dirty packages be?
The existing gta implementation says both
foo
andbar
are dirty. The new implementation says onlyfoo
is dirty. A file change that maps to thefoo_test
package should only ever affect thefoo
package. @bhcleek pointed out to me that this is an edge case that gta currently does not handle well.Dirty vendor packages
Cthulhu merge commit 8bbfdc2ca9b728da4c90f166cfbc69efe10642d9.
I haven't had a chance to dig into this yet, but the existing implementation reported dirty vendor packages. I'm not sure I fully understand the reason for this.