digitalocean / gta

gta: do transitive analysis to find packages whose dependencies have changed
Apache License 2.0
201 stars 21 forks source link

Error listing changed packages in a nested module #24

Open jeromefroe opened 3 years ago

jeromefroe commented 3 years ago

Hi! 👋 I was curious if anyone has used gta in a repo that makes use of nested Go modules? I'm currently running into the following error when running gta in a repo that has a nested Go module when files in the nested module are changed (everything works fine if any files in the top-level module are changed):

17:46:02 main.go:82: can't list dirty packages: github.com/jeromefroe/gtatest/othersrc/pkg not found

I've created a minimal example of my repo layout at github.com/jeromefroe/gtatest. The repo looks like the following:

$ git clone git@github.com:jeromefroe/gtatest.git
Cloning into 'gtatest'...
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (20/20), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 20 (delta 3), reused 19 (delta 2), pack-reused 0
Receiving objects: 100% (20/20), done.
Resolving deltas: 100% (3/3), done.

$ cd gtatest/

$ tree
.
├── go.mod
├── othersrc
│   ├── go.mod
│   └── pkg
│       └── pkg.go
└── src
    └── pkg
        └── pkg.go

4 directories, 4 files

There is a top-level Go module at the root of the repo, and a nested Go module in the othersrc directory.

If I make a change to a file in the top-level module, then gta works just as expected:

$ git checkout jerome/change-parent-package
Branch 'jerome/change-parent-package' set up to track remote branch 'jerome/change-parent-package' from 'origin'.
Switched to a new branch 'jerome/change-parent-package'

$ git diff master --name-only
src/pkg/pkg.go

$ gta -include github.com/jeromefroe/gtates
github.com/jeromefroe/gtatest/src/pkg

However, if I change a file in the nested module, then I run into the aforementioned error:

$ git checkout jerome/change-child-package
Switched to branch 'jerome/change-child-package'
Your branch is up to date with 'origin/jerome/change-child-package'.

$ git diff master --name-only
othersrc/pkg/pkg.go

$ gta -include github.com/jeromefroe/gtates
17:45:56 main.go:82: can't list dirty packages: github.com/jeromefroe/gtatest/othersrc/pkg not found

I took a pass on the code base to try and better understand the problem I was running into, and I believe it's caused by the fact that the call to packages.Load to list all packages in the module doesn't return any packages in the nested module. As far as I understand packages.Load has the same functionality as go list and the latter doesn't listed nested modules:

$ go list ./...
github.com/jeromefroe/gtatest/src/pkg

Any packages in the nested module that are changed will still be returned by the git differ though, and so when they are passed to the PackageFromImport function an error is returned because the packages were never found in the set of loaded packages and so were never added to the dependency graph.

If it's of any benefit, I think it's fine if I have to run gta separately in each nested module in my repo (there's only a small number of them so it's easy to just enumerate them in our build tooling). The only tricky part I think would be detecting when changes in the top-level module are depended on by packages in a nested module.

bhcleek commented 3 years ago

You're right about the nature of the problem: go list, which is depended on by packages.Load will not list packages in the nested module when it's run in the top-level module.

If you want this to work, then you'd need to run gta for each of the nested modules, but I'm not sure if gta in its current form could handle the top level module when a package in a nested module is modified. With a small code change, it should be possible to cause gta to ignore files there were changed in a nested module relative to the ambient module when gta is run (assuming that doesn't work currently).

Even if that code change is made, I'm afraid you may run into problems when a nested module is a dependency of the ambient module when gta is run; gta is built with the expectation that dependencies are vendored. I'd like to change that, but won't have time to do that for a while.

I realize that you may not have an option to break your modules out into their own repositories, but just in case you do and you haven't seen this advice before: https://github.com/golang/go/wiki/Modules#should-i-have-multiple-modules-in-a-single-repository

jeromefroe commented 3 years ago

Hi @bhcleek! Yea, the current setup of the repo with its use of a nested module is certainly not ideal, but I don't think I'll be able to change it anytime soon. Fortunately, the nested module is much smaller than the top-level module so running the full suite of tests for it should be okay for the foreseeable future. The biggest time savings will be from not having to run the full test suite for the top-level module on pull requests.

With a small code change, it should be possible to cause gta to ignore files there were changed in a nested module relative to the ambient module when gta is run (assuming that doesn't work currently).

That would be great for our workflow! And I'd be happy to create a pull request if it would help. I was thinking about adding an additional exclude flag to the binary which would be the functional opposite of the include flag and would remove from consideration any packages which matching the given set of prefixes. What do you think of that approach?

bhcleek commented 3 years ago

@jeromefroe apologies for not getting back to you last week; the notification about your comment got buried in my e-mail.

I don't think we need an exclude for this; we literally need to detect when the changed files are in a different module. There's already code that tries to do something like this, but we need to extend it to detect multiple modules in a way that isn't not currently.

I'm not precisely sure where we need to make this change, but I think we probably need to update dependencyGraph to work with multiple directories: https://github.com/digitalocean/gta/blob/f497543409c4063829d25a1a8ee3ad9ecda93430/packager.go#L215-L304

It might be worth it for us to wait for the go.work proposal to be implemented, because trying to use dependencyGraph as-is is going to require changing the working directory when packages.Load is run to be within each module. 🤔