actions / setup-go

Set up your GitHub Actions workflow with a specific version of Go
MIT License
1.41k stars 528 forks source link

Build output caching does not key off the current module's source? #395

Open nfi-hashicorp opened 1 year ago

nfi-hashicorp commented 1 year ago

IIUC, a job with cache enabled goes roughly like this (with defaults):

  1. (setup-go) restore: download and extract cache blob, keyed by (among other things) go.sum
  2. (user step) build something
  3. (setup-go) save: if cache has key, skip. Otherwise, make a cache blob containing the "module" cache (i.e. source code) the build cache at ~/.cache/go-build/ and upload it with that key.

The key is:

const primaryKey = `setup-go-${platform}-${linuxVersion}go-${versionSpec}-${fileHash}`;

--- https://github.com/actions/setup-go/blob/main/src/cache-restore.ts#L34

But that doesn't include any inputs from the current module's source (other than go.sum)? So really, we're just caching the build outputs of the dependencies?

(Or to be more precise, we bundle all of ~/.cache/go-build, but only when go.sum changes. If we did a build of the current module, then the build cache will contain its build objects. But it will never get updated if only the source of the current module changed.)

It's unclear if this is intended or an oversight. The README.md definitely isn't explicit that the current module's source is not considered WRT to the cache key and whether a save needs to happen.

Also, I haven't really thought about it, but I'm not sure how cache test results play into this.

dsame commented 1 year ago

Hello, the cached directories by default are the those that are outputs of go env GOMODCACHE and go env GOCACHE

The cache invalidates if any of the files in the input cache-dependency-path are updated, by default it is only go.sum

To invalidate the cache on the changes to files other than go.sum the cache-dependency-path should be set explicitly (glob patterns are allowed).

Did the answer help?

nfi-hashicorp commented 1 year ago

Thanks, yes, that confirms what I've gathered from the docs/source.

One other problem I just realized is that the cache key also misses dynamic things like build tags, linker flags, -trimpath, maybe some env vars, etc. Also it seems like platform in the cache key is just the OS of the runner, so if you're doing cross-platform builds, that could probably cause a false cache hit. For any reasonably advanced build, a cache-dependency-path glob like src/** is probably not adequate.

Given that behavior, I'd say there are three actual problems:

Docs should say the caching is for dependencies, not the current module (by default)

And it doesn't really give guidance for caching the current module (like you'd just given me). Even if we added that, I'd say there are a lot of caveats; I as a user probably wouldn't feel confident configuring it, and I bet I'm not alone.

The saved cache may include objects for the current module anyway

This bloats the size of the cache bundle, meaning longer download and extraction times. Not generally a huge problem in the grand scheme of things, but for example, our cache download and extract currently adds about 10 seconds, and due to problems mentioned above, largely doesn't help with build times later on.

I don't have a good suggestion for this. AFAICT there's no good way to even ask Go which cache objects belong to which packages.

Current cache key is probably inadequate

Even just for dependencies, I think the current key is probably inadequate for all but the simplest builds. I'm struggling to find a reference that lists all the various inputs to a build that could invalidate the cache, but I bet they're a lot.

I'm not sure how to even begin to address this. You'd basically have to implement a large chunk of the internal go build caching logic, not fun.

Just how broken is this?

I would say that if you're doing anything much more complicated than vanilla go build ., reusing the build cache is probably not helping you any, and given a big enough build, is probably hurting to some degree. IMO this should be called out in the docs at least.

And I'm not sure setup-go could help with this much, needs better tooling from Go itself, but it'd be nice if we could profile cache hits at a per-object level to see if it's worth doing. I'm pretty sure with consul we're doing cross-platform builds and have the cache turned on, which is probably 100% miss for builds ATM.

PS I don't want it to sound like I'm ragging on the team for their efforts in implementing this; it's hard! Caching builds is far more complicated than one would think.

dsame commented 1 year ago

Hi @nfi-hashicorp, you're more than welcome to describe the missing features. I understand your requirements and the short answer is: the actions/cache is for you.

Reasoning The primary task for setup-go is to work out of the box as much as possible, and to automate what can be automated. Any manual fine-tuning of caching is generally outside the scope of these actions, and is better implemented with full-power actions/cache action.

For example, "current key is probably inadequate for all but the simplest builds" - this is meant to be a compromise for an average use case: caching rarely changing third-party dependencies.

Trying to make it more specific, or to include more than 3rd party dependencies, leads to the cache being invalidated too often (and becoming useless), or growing rapidly and hitting the cache size limit.

"The saved cache may still contain items for the current module" - yes, but the structure of the project is not always the same, and it is nearly impossible to determine which folders can be cached without resetting the cache on every build.

"Docs should say the caching is for dependencies" - this may or may not be true depending on the value of cache-dependency-path, but the cached folders are listed in the logs.

Summary. For a productive discussion, I would advise you to take a look at actions/cache and try to solve the specific real-life problem you have with it. Next, knowing the problem and the solution will be a good starting point to discuss whether it can and should be implemented as a built-in, automated feature of `setup-go'.

nfi-hashicorp commented 1 year ago

I understand your requirements and the short answer is: the actions/cache is for you.

Yeah, possibly. But that means coming up with a cache key that incorporates every input. Possible, maybe, but certainly untenable in the general case.

The primary task for setup-go is to work out of the box as much as possible, and to automate what can be automated.

Personally I think caching should be off by default. It is wasteful in all but the simplest build scenario. It's probably 100% wasted in a cross-platform build scenario, which I would be willing to bet is not uncommon. Even in the simplest case, I don't know how much of a speed up caching would bring. Do you have data about this?

I think one of the bigger UX problems is it's hard to measure the cache hit rate of the actual Go build. Sure, we can see if the setup-go cache key hits, but not if Go actually uses any of its contents.

dsame commented 1 year ago

Hello @nfi-hashicorp ,

With actions/cache it is possible to create any keys using the context and expressions and cache any paths moreover with restore-keys input it is possible to populate invalidated cache with data from the previous build.

Caching was off by default for a long time until few months ago the investigation had been made and according to its results it was decided to turn it on by default.

I'd be extremely grateful if you suggested the build configurations you think not cached in the optimal way an we would improve the action.

nfi-hashicorp commented 1 year ago

In this build, I have two parallel flows, one that builds an AMD64 build twice, and one that builds an ARM64 build twice. I deleted all caches for this repo before starting.

You would expect to see the first build in each flow be slow, since it's building from no cache, then the second one should be fast. But that's not what you see

image

Only the AMD64 one gets a speed boost, because the cache key doesn't include GOARCH.

Indeed, it's slower than a clean build, because it spent almost 30 seconds downloading a 638 MB cache for the wrong arch.

(Also, I notice that since the two flows collide on the same cache key, the second flow ends up failing to save the cache tarball, wasting 30s making a tarball.)

As another example, this build does something similar, but instead of switching arches, it switches on -trimpath.

Similar problem to the first example, the second -trimpath build is slow, even though you would have expected the results from the first build to be cached. They aren't because the cache key doesn't know about -trimpath. And again, it wasted 30s downloading and extracting an irrelevant cache.

image

The point

My point isn't that these two inputs need to be added to the cache key, it's that the cache key has a lot more inputs than these and it would be a lot of work to enumerate (let alone capture!) them. And a cache miss can be quite expensive, just to download the tarball. So yeah, I'd be willing to bet that in most non-trivial go builds, it is actually a detriment, but I'd need to see the data.

(Sidenote: Personally I think the Github action cache approach of uploading tarballs is too naive for most non-trivial builds.)

nfi-hashicorp commented 1 year ago

I should note that the above really only applies to the build cache (GOCACHE). The logic for the mod cache (GOMODCACHE) is probably ok. However, I can see no good way to tell setup-go to ignore the build cache.

dsame commented 1 year ago

Thank you @nfi-hashicorp , we are starting to investigate the case

nfi-hashicorp commented 1 year ago

Thanks! I appreciate all the hard work, LMK if I can help

dsame commented 1 year ago

Hello @nfi-hashicorp , it seems i am getting the idea and tying to reproduce the problem, but it would be helpful to see the builds you've mentioned above. I can not access them now - aren't they are in the private repository?

dsame commented 1 year ago

@nfi-hashicorp , so finally i understood the problem as interfering the builds which targets assume different dependencies and builds results.

Ignoring the GOCACHE does not seem to be moving in the right direction because it just lowers the performance.

Setting different GOCACHE(GOMODCACHE) for the specific build but it does not resolve the problem because of the same key

Do you think having an input cache-key that is added to the auto generated key can solve your problem?

Meanwhile, from that i saw in the comment describing you workflow the workaround for the problem could be to save the target variables in the file and include it into the list of hashed files:

   build:
     env:
         GOOS: ...
         GOARCH: ...

     steps:
       - run: echo "$GOOS $GOARCH"> /tmp/env

       - uses: actions/setup-go@v4
             with:
                go-version: '1.17'
                cache-dependency-path: go.sum /tmp/env

does it help?

nfi-hashicorp commented 1 year ago

aren't they are in the private repository?

Yes, they were :P Sorry about that. It's public now.

Meanwhile, from that i saw in the comment describing you workflow the workaround for the problem could be to save the target variables in the file and include it into the list of hashed files

That almost works, except that it doesn't take into account any of the actual source files in my repo (besides go.sum). It's a little more useful: at least I would probably get build cache hits for my dependencies.

Do you think having an input cache-key that is added to the auto generated key can solve your problem?

It can help, but in the general case, capturing all possible inputs that could affect the cache is practically a very difficult problem. Really only go itself knows that.

Ignoring the GOCACHE does not seem to be moving in the right direction because it just lowers the performance.

Honestly I would be willing to bet that in many use cases, not saving the GOCACHE would result in performance improvement vs the current implementation. It is often quite big and full of many tiny files.

Setting different GOCACHE(GOMODCACHE) for the specific build but it does not resolve the problem because of the same key

I was actually thinking a small win would be to have separate control for those. Your arg cache: true could mean only GOMODCACHE. They should go to separate cache keys (prefixed with GOCACHE or GOMODCACHE for example).

nfi-hashicorp commented 1 year ago

Another problem that's related: apparently Github action cache can't be overwritten, it must be deleted first. I'm not sure if that's considered a bug or design flaw, or "works as intended". But it certainly makes it hard to deal with the GOCACHE.

Imagine we fix the cache key so that it incorporates some but not all of the inputs that could invalidate the GOCACHE for the given build. As an example, say we don't capture GOARCH. Then a typical PR dev flow could get into a broken state:

  1. Dev pushes PR. Workflow does GOARCH=386 go build .. GOCACHE is saved to a fresh GH cache; key doesn't include GOARCH.
  2. Dev thinks, whoops, I didn't mean 386, changes it to GOARCH=amd64. Workflow restores cache for 386 (because they have the same key), does GOARCH=amd64 go build .. GOCACHE would be 100% miss, which Dev would expect. We attempt to overwrite cache key with new amd64 cache, but can't, because it exists.
  3. and onward, all builds will be 100% miss because the original 386 build effectively poisoned the cache.

This is worse than no cache, because we're incurring the cost of a 100% irrelevant download+extract and compress+failed upload cycle.

If we had the ability to overwrite, we could have a GOCACHE from whatever the previous build was, which is almost always more relevant than whatever the first build was.

dsame commented 1 year ago

Hello @nfi-hashicorp

That almost works, except that it doesn't take into account any of the actual source files in my repo (besides go.sum)

if i understood the problem right, it can be solved with adding the source files to cache-dependency-path

       - uses: actions/setup-go@v4
             with:
                go-version: '1.17'
                cache-dependency-path: go.sum mod1/**/*.go

but in the general case, capturing all possible inputs

I suggest to do not take the inputs in the account but instead let the user to add any uniq key to the auto generated key. This allows to modify the cache key per job even the auto generated keys are the same (due to the same underlying files)

        - uses: actions/setup-go@v4
             with:
                go-version: '1.17'
                cache-id: 'arm64'

          - uses: actions/setup-go@v4
             with:
                go-version: '1.17'
                cache-id: 'amd64'              
dsame commented 1 year ago

Another problem that's related: https://github.com/actions/toolkit/issues/505.

You also might be interested in the pretty straightforward solution of this problem:

https://github.com/actions/toolkit/issues/505#issuecomment-1650290249

nfi-hashicorp commented 1 year ago
                cache-dependency-path: go.sum mod1/**/*.go

That could work provided you have no files that are conditionally included. E.g. _test.go, _<arch>.go, etc. However, this would have to effect of having false cache misses, which is preferable to false cache hits IMO.

I suggest to do not take the inputs in the account but instead let the user to add any uniq key to the auto generated key. This allows to modify the cache key per job even the auto generated keys are the same (due to the same underlying files)

Yes, that could work, but requires the user knowing to add keys for those things. And false cache hits are really only recognizable by their timing effects.

I'm wondering if instead of a setup-go action, a build-go might be what advanced users would need. It would not allow for arbitrary go commands to be executed, but basically provide all the args that go build would. Then it could inspect them for caching. A rather large undertaking though.

dsame commented 1 year ago

That could work provided you have no files that are conditionally included

glob masking https://github.com/isaacs/node-glob#readme has very expressive syntax that allows to describe virtually any real-life combinations.

but requires the user knowing to add keys for those things

yes, i assume some trade off between attempts to make fully automated action and requiring some intellect from the user. I consider full-automation impossible because of too many variants and even having not knowledge about intended build targets. But i expect the users who builds multi-target applications to be savvy guys.

So, finally i'd like to get the confirmation "cache-id input can solve the problem of having separate caches for different build targets despite they have exactly the same source codebase"

Alternatively i will suggest the workaround with storing the build into the file and include the file into cache-dependency-path

     steps:
       - run: echo "$GOOS $GOARCH"> /tmp/env

       - uses: actions/setup-go@v4
             with:
                go-version: '1.17'
                cache-dependency-path: go.sum /tmp/env
nfi-hashicorp commented 1 year ago

glob masking isaacs/node-glob#readme has very expressive syntax that allows to describe virtually any real-life combinations.

I haven't looked but I'd be willing to bet it cannot take into account Go build constraints. Those are full boolean expressions.

So, finally i'd like to get the confirmation "cache-id input can solve the problem of having separate caches for different build targets despite they have exactly the same source codebase"

Yes, it can in theory, but in practice specifying all the inputs precisely would be extremely difficult.

We still have the problem that the default behaviour is, IMO, incorrect. After the first run, it will fail to overwrite, even if the contents of the relevant source have changed. It will forever after lug around those initial semi-relevant build objects, until garbage collected or go.sum is incidentally modified. The easiest solution IMO is to disable saving of GOCACHE by default. GOMODCACHE is fine; it should be fully specified by go.sum.

dsame commented 1 year ago

it will fail to overwrite, even if the contents of the relevant source have changed.

The only reason this happens is if the relevant sources are not included in the `cache-dependency-path' input. They need to be there to trigger the cache rebuild.

problem that the default behaviour is, IMO, incorrect.

So, you suggestion is to delete go env GOCACHE directories from the list of cached paths or make it optional (i.e. with input cache-go-build: false) - correct?

nfi-hashicorp commented 1 year ago

So, you suggestion is to delete go env GOCACHE directories from the list of cached paths or make it optional (i.e. with input cache-go-build: false) - correct?

Yes, by default, I think we should not cache GOCACHE. It should be opt-in.

Also, and this is more a matter of opinion, and really needs to be informed by the user, but I don't think caching GOMODCACHE really provides much of a speed boost in most cases. In Consul at least, we got about 15 seconds, which is not worth the complexity, especially since ours is a multi-module repo.

dsame commented 1 year ago

Hello @nfi-hashicorp, sorry for long delay, but it took some time to arrange the scheduled feature, but in turn there's a proposal that could solve most if not all the problems the current implementation of the caching has.

There's an early stage of the development, but i want you to be aware of upcoming changes https://github.com/actions/setup-go/pull/426.