Open bgzzz opened 3 years ago
Hey @bgzzz , we don't currently support this, but I've made a note and we will investigate this feature. Currently, each runner needs their own tool-cache.
Our hosted runners currently bake the tool-cache into the runner image, you can see how that is done in the virtual environments repo. I would recommend you do that with your k8 cluster
@bgzzz - this was posted almost 2 years ago at this point, but are you still using EFS to back the RUNNER_TOOL_CACHE
?
It seems to make more sense than provisioning an EBS volume per pod that'll end up storing the same files over time, but I wasn't sure if it was worth the trouble or not if issues like this crop up often
@JohnYoungers we ran into the same problem recently and it happens quite often for us
@thboop would you be interested in a PR that implements locking, renaming or hardlinks to mitigate this issue?
@robertkowalski - is the use case the same as this issue (java)?
We switched from EBS to EFS a few weeks ago and haven't had any issues yet, although we're just using setup nodejs and dotnet
@JohnYoungers we're using https://github.com/nektos/act which can use a shared docker volume as the toolcache. So when two jobs use setup-node
they might run into race conditions while copying the files from the temp dir into the toolcache (https://github.com/actions/toolkit/blob/a6bf8726aa7b78d4fc8111359cca5d538527b239/packages/tool-cache/src/tool-cache.ts#L436-L439).
Describe the bug We have an issue when we try to use actions/tool-cache package in two different parallel running jobs. Jobs are allocated to a separate self hosted runner. It worth to mention our setup: self hosted runner is a pod running in the k8s cluster. Each pod runs only ones (it is being terminated right after the job allocated to this pod is done). There is a shared volume (shared between self hosted runners/pods) that is mount to each of the runners to the mount path of tool cache folder (RUNNER_TOOL_CACHE env). Meaning each pod can write to this folder (sometimes they do it simultaneously). In our case we use an actions that is called setup-java. Workflow example:
Assuming that we run the workflow for the first time and there is no jdk stored inside tool cache folder. In this case we will have almost simultaneous write to cache folder because actions/setup-java@v2 step is used both in build and test-unit jobs.
Log snippets from both jobs follows: build job, actions/setup-java step
The test-unit actions/setup-java snippet:
I suspect that failure happens here. It happens when two jobs simultaneously start to cache the tool. It looks like it is missing a locking mechanism. What is the desired behaviour in case tool cache folder is shared between runners? Should it be addressed in some other way ?
OS: Linux