containerbuildsystem / cachi2

Cachi2 is a CLI tool that pre-fetches your project's dependencies to aid in making your build process network-isolated.
GNU General Public License v3.0
7 stars 25 forks source link

Add support for go workspaces #457

Closed ejegrova closed 2 months ago

ejegrova commented 7 months ago

discussion: https://github.com/containerbuildsystem/cachi2/discussions/398

checksum validation:

JIRA: STONEBLD-2043

Maintainers will complete the following section

Note: if the contribution is external (not from an organization member), the CI pipeline will not run automatically. After verifying that the CI is safe to run:

ejegrova commented 4 months ago

changed integration test repo to include checks for missing_hash property

eskultety commented 3 months ago

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed? That would lead to a consistent behaviour from UX perspective where we'd have to process all dependent modules. What are the drawbacks of that? @brunoapimentel do you see a use case where ^this wouldn't work and simply fail with cachi2 logic?

brunoapimentel commented 3 months ago

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed? That would lead to a consistent behaviour from UX perspective where we'd have to process all dependent modules. What are the drawbacks of that? @brunoapimentel do you see a use case where ^this wouldn't work and simply fail with cachi2 logic?

Erik and I had a private chat about this, and we figured out that go commands are not affected by which directory they're called in, as long as they're under the directory listed by go env GOWORK.

@eskultety I played with the idea a little bit. The only problem we have is if the directory that contains the go.work file is not a go module (i.e. does not have a go.mod file). Initially, I was trying to work around the need to have a "main module" (the one passed as the input package path in the CLI), but the whole gomod code is based around that idea, so this led to a lot of changes. I think this could work, and maybe the code could even be improved by that, but to do it right, it would be a major change to the gomod code.

So I tried our second idea, that was electing the first workspace listed in go.work as the main module, which is a much simpler approach. You can see the result in the new commit I added, and with it, I could fully process a repository just by pointing to its root folder. I still need to double check if the fact of picking a main module can have any impact on local replacements/workspaces, since Cachi2 uses the main module's version and path to fill data on those. But it is doable in case we want to allow the user to just point to go.work.

brunoapimentel commented 2 months ago

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed?

The conclusion we reached about this suggestion is that, although we want to provide users more flexibility by allowing them to point to a directory that contains a go.work file but is not a go module, such change would require a major refactoring in the code that we don't think is worth pursuing as part of this PR.

The main problem that we reached is how to work around the need of having what Cachi2 defines as a "main module", since for each of the input packages fed to the CLI, a call to _resolve_gomod is made where the package is defined as the "main module" for purposes of the resolve algorithm. If the input package points to a folder that is not a valid go module, than we can't use it as the "main module".

Here's a small (and likely incomplete) list of impacted functions that would need to be changed:

I personally believe that all of those can be worked around, and the code will likely end up cleaner after the refactoring. So we can tackle this as a follow up task. @eskultety Please add any details I might've missed.

eskultety commented 2 months ago

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed?

The conclusion we reached about this suggestion is that, although we want to provide users more flexibility by allowing them to point to a directory that contains a go.work file but is not a go module, such change would require a major refactoring in the code that we don't think is worth pursuing as part of this PR.

I'd just add that solutions should be more often than not proposed as whole in their entirety. That said, I do have to agree that in this particular case we're talking about significant conceptual changes to how we perceive a Go project structure to represent it and process it which in comparison to the changes being proposed in this PR would unarguably only provide a marginal improvement to the consumer facing UI. If this weren't the case or the changes being proposed would also end up making substantial modifications to the code base then it would be up for a lengthy discussion and likely lead to a request for a complete solution.

The main problem that we reached is how to work around the need of having what Cachi2 defines as a "main module", since for each of the input packages fed to the CLI, a call to _resolve_gomod is made where the package is defined as the "main module" for purposes of the resolve algorithm. If the input package points to a folder that is not a valid go module, than we can't use it as the "main module".

Yeah, the concept for the need of a "main" module seems off in context of workspaces, so we definitely need a conceptual change here.

Here's a small (and likely incomplete) list of impacted functions that would need to be changed:

* relevant files (such as go.mod and go.sum) are checked in the main module directory for symlinks that point to outside of the cloned repository

* all go commands are executed in the main module directory

With most Go commands being workspace context aware ^this particular change should be insignificant compared to others

* vendored deps are checked in the `vendor` folder in the main module directory

IIUC ^this one should be taken care of in #553. It should also be straight forward since go work vendor is factually an equivalent of go mod vendor

* the main module name and its path are used to identify its version (as a version tag can be applied to a repo subpath, and also the module can be nested under a `vN` folder, where N corresponds to the major version of that module)

I think ^this one may the biggest problem to figure out in context of the suggested future refactor to introduce the option of pointing cachi2 to a directory containing a go.work file rather than a "main" module directory containing the go.mod file.

* the main module path is used to identify the relative path of each workspace and local replacements, which will end up composing the purls in the SBOM

* the main module path is used to verify if a local replacement resolves to a folder inside the repo

I personally believe that all of those can be worked around, and the code will likely end up cleaner after the refactoring. So we can tackle this as a follow up task. @eskultety Please add any details I might've missed.

In conclusion I agree with the reasoning @brunoapimentel provided and therefore we should continue with the original approach, thanks for the deep investigation @brunoapimentel! I will follow up by creating an issue tracking the future refactor to host the go.work feature improvement once this is ready to be merged.