Open huydhn opened 1 year ago
Hey @huydhn,
Can you please inspect if something is still holding a lock on that file before you do a cleanup? Since this is a system exception, unless the runner is holding a lock and trying to delete the file in parallel (I'm almost certain that is not the case), there must be something not releasing the file on time causing this error. And since this is not happening always, there might be a subprocess releasing a lock on the file after the command returns.
Thank you for looking into this. While we know for sure that a process is holding a lock to that folder, it's very tricky to know its identity. So I hope to be able to gather more information via the runner:
git version 2.41.0.windows.1
from https://git-scm.com/download/winDo you know if there is a way to get more information from the runner log about the exception? Or may be adding retrying support for this step to make it more resilient? I'm also curious to know if other people encounter the same issue.
From the runner log, I don't think we can know which process is holding a lock. We would have to inspect the processes. However, you can try to write a powershell script, maybe to figure out which process is holding a lock on the file. Retries sounds reasonable for this situation, but then I don't know if it would be a right solution. At this point, everything should exit, and failure to delete a file can indicate that some process is being left running. Regardless, I will bring this question up in the meeting in case I'm wrong ☺️
Certainly it won't be a fix to the root case (the process holding the lock), but can you please confirm or decline whether the runners are running in an empemeral way? I.e. in the complete isolation, namely, whether the workspace of one run is completely different folder than the workspace directory of an another run (even if it is started after the first finishes)? Running the runners in complete isolation could the mitigate impact of the issue on the other runs.
Hey @Blackhex,
It will re-use the same directory based on the repository path
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.
Describe the bug GitHub self-hosted Windows runners randomly fail to clean up
_actions
directory in theSet up job
step with the following error:The process cannot access the file 'C:\actions-runner\_work\_actions\...' because it is being used by another process
To Reproduce This error happens randomly and I couldn't reproduce it reliably. On the other hand, it happens daily on our https://github.com/pytorch/pytorch repo CI, for example https://github.com/pytorch/pytorch/actions/runs/5514008233/jobs/10052783742
Expected behavior
Set up job
step completes successfully. As far as I know, there is no external process locking the directories used by the runner and Windows Defender has been removed, so I want to check if it could be something coming from the runner itself as the issue only manifests in the past few months.Runner Version and Platform
Version of your runner?
2.305.0
OS of the machine running the runner?
Windows
Windows git version?
git version 2.41.0.windows.1
What's not working?
Set up job
step randomly fail to clean up_actions
directory on Windows runners. The exception comes from this step in the worker https://github.com/actions/runner/blob/main/src/Runner.Worker/ActionManager.cs#L84-L87Job Log Output
Runner and Worker's Diagnostic Logs
There are some related worker logs: