actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.45k stars 1.18k forks source link

Caching directories outside of workspace (self-hosted vs GitHub-hosted) #1127

Open reiniertimmer opened 1 year ago

reiniertimmer commented 1 year ago

We are using actions/cache in between jobs that run on a variety of runners (GitHub-hosted and self-hosted) and we noticed the following:

When creating a cache on ubuntu-latest of a directory that is outside the workspace (for example Java - Maven caches the ~/.m2/repository directory), we noticed that the cache archive stores paths relative to the current workspace directory (using ../../..).

When restoring this path on a self-hosted runner, this means that the cache restore will place the files on the ../../.. directory relative to the working directory of the self-hosted runner. This may be different from the filesystem structure of the GitHub-hosted runners, meaning that the cache will be restored in the wrong location (and maven will not be able to find it)

##[debug]Archive Path: /runner/_work/_temp/3670ed6f-f2b5-4387-b1c0-7dee396807af/cache.tzst
##[debug]Use Azure SDK: true
##[debug]Download concurrency: 8
##[debug]Request timeout (ms): 30000
##[debug]Cache segment download timeout mins env var: 5
##[debug]Segment download timeout (ms): 300000
##[debug]Downloading segment at offset 0 with length 262618027...
Received 0 of 262618027 (0.0%), 0.0 MBs/sec
Received 46137344 of 262618027 (17.6%), 22.0 MBs/sec
Received 92274688 of 262618027 (35.1%), 29.3 MBs/sec
Received 146800640 of 262618027 (55.9%), 35.0 MBs/sec
Received 197132288 of 262618027 (75.1%), 37.6 MBs/sec
Received 254229419 of 262618027 (96.8%), 40.4 MBs/sec
Received 262618027 of 262618027 (100.0%), 39.8 MBs/sec
/usr/bin/tar -tf /runner/_work/_temp/3670ed6f-f2b5-4387-b1c0-7dee396807af/cache.tzst -P --use-compress-program unzstd
../../../.m2/repository/
../../../.m2/repository/log4j/
../../../.m2/repository/log4j/log4j/
TingluoHuang commented 1 year ago

it would be nice if the actions/cache can allow you optional provide an absolute path when restore.

Or a more general question, how to use actions/cache across different runners that might have some different in the file system structure?

y-luis-rojo commented 1 year ago

Hi, cache is not working for me on a self-hosted runner, the debug output shows a warning when using actions/setup-java@v3:

##[debug]Archive Path: /runner/_work/_temp/1dd947b4-9cce-49e9-8d2b-18000c5d16b8/cache.tzst
##[debug]Use Azure SDK: true
##[debug]Download concurrency: 8
##[debug]Request timeout (ms): 30000
##[debug]Cache segment download timeout mins env var: undefined
##[debug]Segment download timeout (ms): 3600000
##[warning]Failed to restore: request to *** failed, reason: tunneling socket could not be established, statusCode=503
##[debug]Failed to delete archive: Error: ENOENT: no such file or directory, unlink '/runner/_work/_temp/1dd947b4-9cce-49e9-8d2b-18000c5d16b8/cache.tzst'
maven cache is not found

The self-hosted runners run at Kubernetes cluster (using https://github.com/actions/actions-runner-controller, and Docker image is https://hub.docker.com/r/summerwind/actions-runner).

Could it be related to this issue?

I'm using GitHub Enterprise Server 3.6.11.

jduan-highnote commented 1 year ago

I spent a lot of time debugging this problem. This behavior is quite confusing and hidden. You'd think ~ means home directory and that's where the cache should be restored to. But that's not the case if the cache is saved and restored on different kinds of runners (eg: github-hosted vs self-hosted).

brendanlafond commented 1 year ago

Running into a variation of this issue. Using the actions/runner:latest image with 1 small modification to make directory and change ownership of /opt/hostedtoolcache to enable caching to work (https://github.com/actions/runner/issues/2522), I found that the action is jumping up 1 directory too many and it causes cache to be missed then tries to execute thousands of mkdir commands that all fail for permissions errors. I'm using ~/.m2/repository as the path. This should be searching my $HOME, which is /home/runner (my user is runner) for a .m2 directory. Instead it's trying to create directories in /home/.m2, which doesn't exist. The directory /home/runner/.m2 does exist. It's the relative pathing that is the problem.

Calling the action:

- name: CACHE
        uses: actions/cache@v3
        id: cache
        with:
          path: ~/.m2/repository
          key: ${{ runner.os }}-${{ hashFiles('**/lockfiles') }}

Small snippet from error:

Cache Size: ~1146 MB (1201286438 B)
/usr/bin/tar -xf /home/runner/_work/_temp/4fcf19ed-6468-41ce-85b7-8298218d66af/cache.tgz -P -C /home/runner/_work/testing-runners/testing-runners -z
/usr/bin/tar: ../../../../.m2: Cannot mkdir: Permission denied
/usr/bin/tar: ../../../../.m2/repository: Cannot mkdir: No such file or directory

Printing $HOME: /home/runner

If I do cd ../../../../ then I end up in /home not /home/runner.

Please note that this is using GitHub's recommended self-hosted runner image. The only customization was to make actions/cache work.

If I use a direct path /home/runner/.m2/repository then the problem is resolved. I now have to convince everyone using these runners to make this change because passing ~/.m2 isn't actually the right directory when it executes.

my3sons commented 11 months ago

Is anyone looking into this issue? We are facing the same thing as well. We have a workflow that has jobs run by both self-hosted and github managed runners. We are generating the cache on our self-hosted runners, the github managed runners are able to find the cache based on the cache key, but the tar command then fails. Below is a summary of what we see:

Cache generation and restore on our self-hosted runner:

##[group]Run actions/cache@v3
 with:
   path: ~/.m2/repository
   key: Linux-m2-298e1de8f8604fbc9ad66f068f909d6c61f48d4f98b5883f7ae39c64b2800708
 enableCrossOsArchive: false
   fail-on-cache-miss: false
  lookup-only: false
 ##[endgroup]
 Received 92274688 of 102578862 (90.0%), 88.0 MBs/sec
 Cache Size: ~98 MB (102578862 B)
 [command]/usr/bin/tar -xf /runner/_work/_temp/5b4925d7-f368-448b-ab10-337b137b21be/cache.tzst -P -C /runner/_work/repair-workshop/repair-workshop --use-compress-program unzstd
 Cache restored successfully
 Cache restored from key: Linux-m2-298e1de8f8604fbc9ad66f068f909d6c61f48d4f98b5883f7ae39c64b2800708

but when we try and restore the cache in the github managed runner job we see this:

##[group]Run actions/cache@v3
 with:
   path: ~/.m2/repository
   key: Linux-m2-298e1de8f8604fbc9ad66f068f909d6c61f48d4f98b5883f7ae39c64b2800708
   enableCrossOsArchive: false
   fail-on-cache-miss: false
   lookup-only: false
 env:
   JAVA_HOME: /opt/hostedtoolcache/Java_Adopt_jdk/17.0.8-101/x64
   JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Adopt_jdk/17.0.8-101/x64
 ##[endgroup]
 Received 0 of 102578862 (0.0%), 0.0 MBs/sec
 Received 71303168 of 102578862 (69.5%), 34.0 MBs/sec
 Cache Size: ~98 MB (102578862 B)
 [command]/usr/bin/tar -xf /home/runner/work/_temp/c59ddc4f-ed53-4d25-a168-ca3841050dc7/cache.tzst -P -C /home/runner/work/repair-workshop/repair-workshop --use-compress-program unzstd
 /usr/bin/tar: ../../../../home: Cannot mkdir: Permission denied
 /usr/bin/tar: ../../../../home/runner/.m2/repository: Cannot mkdir: No such file or directory
 /usr/bin/tar: ../../../../home: Cannot mkdir: Permission denied
... this goes on and on

@brendanlafond , I assume that based on your post, you forked the action and modified it to get around this issue? Is there no other work around using the current version of the action?

brendanlafond commented 11 months ago

I did not fork the action to resolve. I set an absolute path instead of the reference to home because it isn’t using it anyway. We are creating cache on ephemeral self-hosted runners and restoring to ephemeral self-hosted runners and it isn’t working. There’s no GitHub hosted runners involved in our solution. While I was able to resolve with an absolute path, I was hoping that the bug would be resolved so it uses the path relative to home. I have team members consistently using the ~/.m2 path because it's in the documentation. It doesn't work on the GitHub ARC self-hosted runners built from /actions/runner image. For some reason this image differs from the GitHub hosted in terms of directory structure. Would be way easier if those were the same.

Regards,

Brendan Lafond

Director, Agile Transformation

He | Him | His

@.***

225 West Station Square Drive, Suite 700, Pittsburgh, PA 15219


From: my3sons @.> Sent: Saturday, October 7, 2023 9:38:08 AM To: actions/cache @.> Cc: Lafond, Brendan @.>; Mention @.> Subject: Re: [actions/cache] Caching directories outside of workspace (self-hosted vs GitHub-hosted) (Issue #1127)

Is anyone looking into this issue? We are facing the same thing as well. We have a workflow that has jobs run by both self-hosted and github managed runners. We are generating the cache on our self-hosted runners, the github managed runners ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Is anyone looking into this issue? We are facing the same thing as well. We have a workflow that has jobs run by both self-hosted and github managed runners. We are generating the cache on our self-hosted runners, the github managed runners are able to find the cache based on the cache key, but the tar command then fails. Below is a summary of what we see:

Cache generation and restore on our self-hosted runner:

[group]Run @.***

with: path: ~/.m2/repository key: Linux-m2-298e1de8f8604fbc9ad66f068f909d6c61f48d4f98b5883f7ae39c64b2800708 enableCrossOsArchive: false fail-on-cache-miss: false lookup-only: false

[endgroup]

Received 92274688 of 102578862 (90.0%), 88.0 MBs/sec Cache Size: ~98 MB (102578862 B) [command]/usr/bin/tar -xf /runner/_work/_temp/5b4925d7-f368-448b-ab10-337b137b21be/cache.tzst -P -C /runner/_work/repair-workshop/repair-workshop --use-compress-program unzstd Cache restored successfully Cache restored from key: Linux-m2-298e1de8f8604fbc9ad66f068f909d6c61f48d4f98b5883f7ae39c64b2800708

but when we try and restore the cache in the github managed runner job we see this:

[group]Run @.***

with: path: ~/.m2/repository key: Linux-m2-298e1de8f8604fbc9ad66f068f909d6c61f48d4f98b5883f7ae39c64b2800708 enableCrossOsArchive: false fail-on-cache-miss: false lookup-only: false env: JAVA_HOME: /opt/hostedtoolcache/Java_Adopt_jdk/17.0.8-101/x64 JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Adopt_jdk/17.0.8-101/x64

[endgroup]

Received 0 of 102578862 (0.0%), 0.0 MBs/sec Received 71303168 of 102578862 (69.5%), 34.0 MBs/sec Cache Size: ~98 MB (102578862 B) [command]/usr/bin/tar -xf /home/runner/work/_temp/c59ddc4f-ed53-4d25-a168-ca3841050dc7/cache.tzst -P -C /home/runner/work/repair-workshop/repair-workshop --use-compress-program unzstd /usr/bin/tar: ../../../../home: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../home/runner/.m2/repository: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../home: Cannot mkdir: Permission denied ... this goes on and on

@brendanlafond [github.com]https://urldefense.com/v3/__https://github.com/brendanlafond__;!!P19mnmj49A!WXIpfK8gyX9ejusv5IqCBn3FDCnSX08wGW4dsCOxlFENawNHyHQ0ivyg7tG1ysTXV1eM_KtL9BItiMBV5qqrKC8uUw$ , I assume that based on your post, you forked the action and modified it to get around this issue? Is there no other work around using the current version of the action?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https://github.com/actions/cache/issues/1127*issuecomment-1751715388__;Iw!!P19mnmj49A!WXIpfK8gyX9ejusv5IqCBn3FDCnSX08wGW4dsCOxlFENawNHyHQ0ivyg7tG1ysTXV1eM_KtL9BItiMBV5qqHeObaLw$, or unsubscribe [github.com]https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AUMLLDTTMVQDUKKRXSMBUNLX6FLMBAVCNFSM6AAAAAAVK6UCKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRG4YTKMZYHA__;!!P19mnmj49A!WXIpfK8gyX9ejusv5IqCBn3FDCnSX08wGW4dsCOxlFENawNHyHQ0ivyg7tG1ysTXV1eM_KtL9BItiMBV5qr62ODPUw$. You are receiving this because you were mentioned.Message ID: @.***>

my3sons commented 11 months ago

got it, thanks @brendanlafond! I have tried using absolute paths as well and so far no luck with that as I cross from self-hosted to github runners.

deitch commented 9 months ago

Any update on this? It has been 9 months since it was opened.

When looking at the logs from running the cache action, it says it already -C /home/dir/for/runner/user, so shouldn't it resolve that all paths relative to home should just be, well, relative?

I would be tempted to open a PR, but I see a bunch of open PRs for a long time; I am concerned with putting in time that doesn't get merged.

deitch commented 9 months ago

Also, using relative paths doesn't work at all:

with:
  path: subdir/me

Gets a warning

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

Nuru commented 2 months ago

I am still having this problem, please fix.

Note that this issue is also still occurring. It is only closed because the problem is here, not there.

bmacnaughton commented 1 month ago

i am seeing what is possibly a related issue:

in a private repo, we get timeouts trying to restore cache but only on windows and only if run with a non-standard larger runner.

This appears to be a relatively low-level issue (maybe I'm reporting it in the wrong place) because the failure pattern is reproduced whether using actions/cache or buildjet/cache (i hoped that would fix the problem). When using either one, the pattern is the same - the cache is downloaded 100% and times out after that. The download completes in a few seconds.

This is definitely not an obvious path/configuration problem - the particular jobs that fail vary and they will all eventually succeed if rerun enough times - sometimes once, sometimes more.

Please advise on any course of action, including filing in issue in another repo.

Run actions/cache/restore@v4
  with:
    path: ./*
    key: node-Windows-[2](https://github.com/Contrast-Security-Inc/node-mono/actions/runs/10115254189/job/27975665889#step:6:2)0-cffdd87a2f9ae6a26eab7e55a5d6[3](https://github.com/Contrast-Security-Inc/node-mono/actions/runs/10115254189/job/27975665889#step:6:3)f9597e85e1e
    enableCrossOsArchive: false
    fail-on-cache-miss: false
    lookup-only: false
  env:
    HUSKY: 0
    NPM_VERSION: 9
Cache Size: ~98 MB (1032[4](https://github.com/Contrast-Security-Inc/node-mono/actions/runs/10115254189/job/27975665889#step:6:4)9742 B)
"C:\Program Files\Git\usr\bin\tar.exe" -xf C:/a/_temp/abcb62[5](https://github.com/Contrast-Security-Inc/node-mono/actions/runs/10115254189/job/27975665889#step:6:5)8-5beb-4b65-b902-d4166e9d3f47/cache.tzst -P -C C:/a/node-mono/node-mono --force-local --use-compress-program "zstd -d"
Received 103249[7](https://github.com/Contrast-Security-Inc/node-mono/actions/runs/10115254189/job/27975665889#step:6:7)42 of 103249742 (100.0%), [9](https://github.com/Contrast-Security-Inc/node-mono/actions/runs/10115254189/job/27975665889#step:6:9)7.3 MBs/sec
Error: The action 'Restore cache' has timed out after 2 minutes.