actions / runner

The Runner for GitHub Actions :rocket:
https://github.com/features/actions
MIT License
4.86k stars 954 forks source link

Self-hosted runner cleanup/update bloat growing over time #2708

Open jschwartz-cray opened 1 year ago

jschwartz-cray commented 1 year ago

Describe the bug NOTE: I recognize this is on the boundary between bug, feature, and documentation request since I'm not entirely sure what I'm looking at, so please be gentle. Starting out here.

The self-hosted runner update process appears to be leaving behind multiple directories, e.g.

bin.2.296.3
externals.2.296.3
_work/_update/externals

plus another copy of externals under _work:

_work/__externals__

In an environment with a significant number of self-hosted persistent runners this really starts to add up and it's not clear what is safe to cleanup here, or if/how the normal update process will ever clean any of this up?

In my case the root externals symlink in my runners is a symlink to externals.2.299.2 so I suspect it's safe to remove externals.2.296.3 from an old version (and likewise for bin)? What about _work/_update in general, and _work/_update/externals specifically? Can _work/__externals__ be cleaned up at the end of jobs?

It seems that perhaps the runner update process is being conservative in removing things to allow rolling back to a previous version, but given the nature of it I'm not sure that's a great strategy. It would be nice if this was exposed as a configurable choice, or there was at least a config.sh (or similar) tool to remove old unneeded versions. The alternative would be for me to periodically remove all my runners and reinstall them from scratch but that's a lot of churn.

Another alternative would be a feature to allow multiple self-hosted runners on the same machine to share a common install/bin/externals, with each one being able to point to a different version. I wouldn't be too concerned with cleaning this up if there was only one copy of it instead of N.

To Reproduce

  1. Install an old version of a self-hosted persistent runner
  2. du -hs the install directory
  3. update it
  4. du -hs the install directory

Expected behavior There is no significant difference between the size of a freshly installed runner and one which has been updated.

Runner Version and Platform

2.99.2, x86_64 Linux

OS of the machine running the runner? OSX/Windows/Linux/... CentOS 7

What's not working?

Multiple directories are being left behind:

$ du -hs bin.2.296.3 externals.2.296.3 _work/_update/externals _work/__externals__
74M bin.2.296.3
318M    externals.2.296.3
318M    _work/_update/externals
318M    _work/__externals__

Job Log Output

N/A

Runner and Worker's Diagnostic Logs

Example SelfUpdate-20230627-011005.log.succeed:

[2023-06-26 20:10:05-3953] --------whoami--------
runner_user
[2023-06-26 20:10:05-3982] --------whoami--------
[2023-06-26 20:10:05-3995] Waiting for Runner.Listener (21314) to complete
[2023-06-26 20:10:05-4007] Process 21314 still running
[2023-06-26 20:10:07-0022] Process 21314 still running
[2023-06-26 20:10:09-0023] Process 21314 finished running
[2023-06-26 20:10:09-0036] Sleep 1 more second to make sure process exited
[2023-06-26 20:10:10-0024] move /home/users/runner_user/runner_dir/bin /home/users/runner_user/runner_dir/bin.2.296.3
‘/home/users/runner_user/runner_dir/bin’ -> ‘/home/users/runner_user/runner_dir/bin.2.296.3’
[2023-06-26 20:10:10-0055] move /home/users/runner_user/runner_dir/externals /home/users/runner_user/runner_dir/externals.2.296.3
‘/home/users/runner_user/runner_dir/externals’ -> ‘/home/users/runner_user/runner_dir/externals.2.296.3’
[2023-06-26 20:10:10-0085] Create junction bin folder
[2023-06-26 20:10:10-0111] Create junction externals folder
[2023-06-26 20:10:10-0163] Update succeed
[2023-06-26 20:10:10-0186] update.finished file creation succeed
[2023-06-26 20:10:10-0197] Rename /home/users/runner_user/runner_dir/_diag/SelfUpdate-20230627-011005.log to be /home/users/runner_user/runner_dir/_diag/SelfUpdate-20230627-011005.log.succeed
‘/home/users/runner_user/runner_dir/_diag/SelfUpdate-20230627-011005.log’ -> ‘/home/users/runner_user/runner_dir/_diag/SelfUpdate-20230627-011005.log.succeed’
TCFMG commented 1 year ago

I think we may be experiencing the same or related issue - the root directory of our self-hosted runner(s) has multiple versions of externals and bin directories, with the sym. links pointing to the latest versions (see attached image).

Presumably only the latest versions are required and should be automatically deleted when the runner updates?

This issued has contributed to an outage due to low disk space. For example, bin.2.307.1 and externals.2.307.1 take up 390MB.

I have successfully tested removing the older externals and bin directories from one of our runners and then running a GitHub action that uses it.

image

digodk commented 1 year ago

I'd like to second this issue. Each update risks the runners stop working due to low disk space.

####@#####:/runners/builders/builder-1# tree | grep externals
│   ├── __externals__
│   │   ├── externals
├── externals -> /runners/builders/builder-1/externals.2.310.2
├── externals.2.309.0
├── externals.2.310.2
kk-min commented 1 year ago

We are also facing similar issues highlighted above, our runners stopping due to low disk space due to old versions of externals and bin.

It would be nice to have some documentation detailing the what some of the directories in actions-runner do so that we can discern which are safe to delete for cleanups.

moonwitch commented 1 year ago

This actually explains the issue I am encountering as well. So it would be excellent to know which dirs/files we can manually clean up.

digodk commented 1 year ago

Strangely, this issue seems to be related to the latest versions of the runners. I have some older selfhosted runners that did not keep the directories for older versions, only 2.309.0 and 2.310.2. Also, node20 seems to be the main cause of disk usage.

tt-rkim commented 8 months ago

+1

Has anyone in the community figured out any dirs that are definitely OK to delete?

danyalutsevich commented 5 months ago

any update? have the same issue, folder with my runner is now 1.7G