Tool-cache: Support multiple runners sharing a single tool cache

bgzzz commented 3 years ago

Describe the bug We have an issue when we try to use actions/tool-cache package in two different parallel running jobs. Jobs are allocated to a separate self hosted runner. It worth to mention our setup: self hosted runner is a pod running in the k8s cluster. Each pod runs only ones (it is being terminated right after the job allocated to this pod is done). There is a shared volume (shared between self hosted runners/pods) that is mount to each of the runners to the mount path of tool cache folder (RUNNER_TOOL_CACHE env). Meaning each pod can write to this folder (sometimes they do it simultaneously). In our case we use an actions that is called setup-java. Workflow example:

name: pr-java
on: [pull_request]
jobs:
  build:
    runs-on: self-hosted
    steps:
        - uses: actions/checkout@v2
          with:
            ref: ${{ github.event.pull_request.head.sha }}
        - uses: actions/cache@v1
          with:
            path: ~/.m2/repository
            key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
            restore-keys: |
              ${{ runner.os }}-maven-
        - uses: actions/setup-java@v2
          with:
            java-version: 11
        - name: Docker build and push
          run: |
            mvn -B compile
            mvn -B -Dmaven.test.skip=true clean package
            mvn -B -Dmaven.test.skip=true -Dgit-revision=$(git rev-parse HEAD) dockerfile:build
            mvn -B -Dgit-revision=latest dockerfile:build
            mvn -B -Dgit-revision=$(git rev-parse HEAD) dockerfile:push
  test-unit:
    runs-on: self-hosted
    steps:
        - uses: actions/checkout@v2
          with:
            ref: ${{ github.event.pull_request.head.sha }}
        - uses: actions/cache@v1
          with:
            path: ~/.m2/repository
            key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
            restore-keys: |
              ${{ runner.os }}-maven-
        - uses: actions/setup-java@v2
          with:
            java-version: 11

Assuming that we run the workflow for the first time and there is no jdk stored inside tool cache folder. In this case we will have almost simultaneous write to cache folder because actions/setup-java@v2 step is used both in build and test-unit jobs.

Log snippets from both jobs follows: build job, actions/setup-java step

##[debug]Evaluating condition for step: 'Run tradeshift/actions-setup-java@v1'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Run tradeshift/actions-setup-java@v1
##[debug]Register post job cleanup for action: tradeshift/actions-setup-java@v1
##[debug]Loading inputs
##[debug]Evaluating: secrets.MTLS_CACERT
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MTLS_CACERT'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_P12
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_P12'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_P12_PASSWORD
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_P12_PASSWORD'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_SETTINGS
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_SETTINGS'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_SECURITY
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_SECURITY'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Run tradeshift/actions-setup-java@v1
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/build.properties
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/build.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/build.properties
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/build.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/build.properties
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/build.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/build.properties
##[debug]jdk extracted to /tmp/_temp/temp_1451549034/zulu11.48.21-ca-jdk11.0.11-linux_x64
##[debug]Caching tool jdk 11.0.11 x64
##[debug]source dir: /tmp/_temp/temp_1451549034/zulu11.48.21-ca-jdk11.0.11-linux_x64
##[debug]destination /var/cache/tools/jdk/11.0.11/x64
##[debug]finished caching tool
::set-output name=path::/var/cache/tools/jdk/11.0.11/x64
##[debug]='/var/cache/tools/jdk/11.0.11/x64'
::set-output name=version::11.0.11
##[debug]='11.0.11'

The test-unit actions/setup-java snippet:

##[debug]Evaluating condition for step: 'Run tradeshift/actions-setup-java@v1'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Run tradeshift/actions-setup-java@v1
##[debug]Register post job cleanup for action: tradeshift/actions-setup-java@v1
##[debug]Loading inputs
##[debug]Evaluating: secrets.MTLS_CACERT
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MTLS_CACERT'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_P12
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_P12'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_P12_PASSWORD
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_P12_PASSWORD'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_SETTINGS
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_SETTINGS'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.MAVEN_SECURITY
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'MAVEN_SECURITY'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Run tradeshift/actions-setup-java@v1
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/MemoryMonitor/build.properties
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/build.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/FullThreadDump/build.properties
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/build.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/VerboseGC/build.properties
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/build.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/jdk.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/project.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/file-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/nbproject/netbeans-targets.xml
zulu11.48.21-ca-jdk11.0.11-linux_x64/demo/nbproject/management/JTop/build.properties
##[debug]jdk extracted to /tmp/_temp/temp_1283980454/zulu11.48.21-ca-jdk11.0.11-linux_x64
##[debug]Caching tool jdk 11.0.11 x64
##[debug]source dir: /tmp/_temp/temp_1283980454/zulu11.48.21-ca-jdk11.0.11-linux_x64
##[debug]destination /var/cache/tools/jdk/11.0.11/x64
Error: Command failed: rm -rf "/var/cache/tools/jdk/11.0.11/x64"
rm: cannot remove '/var/cache/tools/jdk/11.0.11/x64/include': Directory not empty

##[debug]Node Action run completed with exit code 1

I suspect that failure happens here. It happens when two jobs simultaneously start to cache the tool. It looks like it is missing a locking mechanism. What is the desired behaviour in case tool cache folder is shared between runners? Should it be addressed in some other way ?

OS: Linux

thboop commented 3 years ago

Hey @bgzzz , we don't currently support this, but I've made a note and we will investigate this feature. Currently, each runner needs their own tool-cache.

Our hosted runners currently bake the tool-cache into the runner image, you can see how that is done in the virtual environments repo. I would recommend you do that with your k8 cluster

JohnYoungers commented 1 year ago

@bgzzz - this was posted almost 2 years ago at this point, but are you still using EFS to back the RUNNER_TOOL_CACHE?

It seems to make more sense than provisioning an EBS volume per pod that'll end up storing the same files over time, but I wasn't sure if it was worth the trouble or not if issues like this crop up often

robertkowalski commented 1 year ago

@JohnYoungers we ran into the same problem recently and it happens quite often for us

ZauberNerd commented 1 year ago

@thboop would you be interested in a PR that implements locking, renaming or hardlinks to mitigate this issue?

JohnYoungers commented 1 year ago

@robertkowalski - is the use case the same as this issue (java)?

We switched from EBS to EFS a few weeks ago and haven't had any issues yet, although we're just using setup nodejs and dotnet

ZauberNerd commented 1 year ago

@JohnYoungers we're using https://github.com/nektos/act which can use a shared docker volume as the toolcache. So when two jobs use setup-node they might run into race conditions while copying the files from the temp dir into the toolcache (https://github.com/actions/toolkit/blob/a6bf8726aa7b78d4fc8111359cca5d538527b239/packages/tool-cache/src/tool-cache.ts#L436-L439).

actions / toolkit

Tool-cache: Support multiple runners sharing a single tool cache #804