Open colemickens opened 4 months ago
This module is really not easy for me to follow. I wish this were parameterized systemd units and used systemd state dirs more normally. I really don't think I can fix this without resurrecting my re-write.
I really hope someone can look. The way workDir is setup, linked to stateDir, tied to the bind mounts, and the ordering of stuff makes this non-trivial. But also, as far as I can tell workDir
is basically completely broken and unusable.
Given that, and the default behavior to put the state in temporary storage, means that using the github-runner for nixpkgs-related things is indeed quite painful. Every single time the service or computer restarts, I have to very slowly reclone nixpkgs.
Thanks for reporting this! I don't agree that "workDir
is basically completely broken and unusable"; it works fine in our runner configurations. However, I understand that you may have a different idea of what the workDir
option should be able to do for you. Please also note that the "service will clean this directory on each service start".
Could you please describe in more detail what you are trying to do and I'll try to help? I'm also happy to review a PR if you come up with a solution to your problem. We're always looking to improve the module 🙂
it works fine in our runner configurations
Ah! the feeling of both shame and hope! Thanks for the kind reply, hopefully my words weren't too pointed, I certainly appreciate the module!
Please also note that the "service will clean this directory on each service start".
That might be why I ended up radically reworking the module before dropping it after the recent refactor. :/ Sigh.
Could you please describe in more detail what you are trying to do and I'll try to help? I'm also happy to review a PR if you come up with a solution to your problem. We're always looking to improve the module 🙂
I did take a crack at it over the weekend, but I do fear it requires more attention, and then that will require more attention to integrate a non-hacky rewrite with what we have now. Of course I don't expect anyone to do this for me, but I don't know when I'll be able to engage with this. And it also sounds like I might be holding it wrong.
My apologies for also not making my use-case more clear to guide the conversation --
basically I want the runner's job's workspace directory to persist between runs. Normally I would be quite opposed to this, but re-cloning nixpkgs on each run is an inefficiency and slowness I can't justify.
As far as I know, this is the naive default for self-hosted runners. While again, I like that NixOS tends towards clean-slate, idempotency, I really am seeking a persistent dir.
The other issue is -- my remote CI builder is very memory starved and losing 4.5GB to git checkouts on tmpfs (/run) makes everything that much harder.
I can very much relate to the memory issue stemming from the tmpfs. That's how we solve that:
services.github-runners."a-runner" = {
# ...
# Use an additional `StateDirectory=` as `workDir`
serviceOverrides.StateDirectory = [
"github-runner/a-runner" # module default
"github-runner-work/a-runner"
];
workDir = "/var/lib/github-runner-work/a-runner";
};
Interesting, that sort of makes sense. You leverage the machinery that pre-creates the state dir, it must be sequenced differently wrt to binds, and then your work dir just lands inside of it. This gives me some ideas, thank you @veehaitch.
(My gut first impression is that the module should maybe do something like that for you if you have workdir set? (handwaving). Because I think if you change workDir
to not be inside stateDirectory
, you'll find yourself hitting similar errors as me.
I'm still having issues, with the module as it appears in nixos-unstable
:
services = {
github-runners = {
"${runnerName}" = {
enable = true;
url = "https://github.com/colemickens/nixcfg";
tokenFile = config.sops.secrets."github-runner-token".path;
replace = true;
name = runnerName;
serviceOverrides.StateDirectory = [
"github-runner/${runnerName}" # module default
];
workDir = "/var/lib/github-runner/${runnerName}"; # TODO: make sure this works
extraLabels = [ runnerName ];
};
};
results in:
Feb 21 18:29:23 slynux systemd[1]: Starting GitHub Actions runner...
Feb 21 18:29:23 slynux z5gpr3smv6jfmphp4b5x3y679scqjhfy-github-runner-slynux-default-unconfigure.sh[82337]: Config has changed, removing old runner state.
Feb 21 18:29:23 slynux z5gpr3smv6jfmphp4b5x3y679scqjhfy-github-runner-slynux-default-unconfigure.sh[82337]: The old runner will still appear in the GitHub Actions UI. You have to remove it manually.
Feb 21 18:29:23 slynux (igure.sh)[82358]: github-runner-slynux-default.service: Failed to set up mount namespacing: /var/lib/private/github-runner/slynux-default/.current-token: No such file or directory
Feb 21 18:29:23 slynux systemd[1]: github-runner-slynux-default.service: Control process exited, code=exited, status=226/NAMESPACE
Feb 21 18:29:23 slynux systemd[1]: github-runner-slynux-default.service: Failed with result 'exit-code'.
Feb 21 18:29:23 slynux systemd[1]: Failed to start GitHub Actions runner.
if I make the inaccessible token path optional in the module, then I get:
Feb 21 18:26:28 slynux systemd[1]: Starting GitHub Actions runner...
Feb 21 18:26:28 slynux z5gpr3smv6jfmphp4b5x3y679scqjhfy-github-runner-slynux-default-unconfigure.sh[79165]: Config has changed, removing old runner state.
Feb 21 18:26:28 slynux z5gpr3smv6jfmphp4b5x3y679scqjhfy-github-runner-slynux-default-unconfigure.sh[79165]: The old runner will still appear in the GitHub Actions UI. You have to remove it manually.
Feb 21 18:26:29 slynux systemd[1]: Started GitHub Actions runner.
Feb 21 18:26:29 slynux Runner.Listener[79257]: Unhandled exception. System.IO.IOException: Too many levels of symbolic links : '/var/lib/github-runner/slynux-default/.credentials'
Feb 21 18:26:29 slynux Runner.Listener[79257]: at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at System.IO.File.InternalReadAllText(String path, Encoding encoding)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at System.IO.File.ReadAllText(String path, Encoding encoding)
Feb 21 18:26:29 slynux Runner.Listener[79257]: at GitHub.Runner.Sdk.IOUtil.LoadObject[T](String path, Boolean required) in /build/src/src/Runner.Sdk/Util/IOUtil.cs:line 47
Feb 21 18:26:29 slynux Runner.Listener[79257]: at GitHub.Runner.Common.HostContext..ctor(String hostType, String logFile) in /build/src/src/Runner.Common/HostContext.cs:line 216
Feb 21 18:26:29 slynux Runner.Listener[79257]: at GitHub.Runner.Listener.Program.Main(String[] args) in /build/src/src/Runner.Listener/Program.cs:line 20
Feb 21 18:26:29 slynux systemd[1]: github-runner-slynux-default.service: Main process exited, code=dumped, status=6/ABRT
Feb 21 18:26:29 slynux systemd[1]: github-runner-slynux-default.service: Failed with result 'core-dump'.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/github-runner-seeking-advice/41719/1
@veehaitch any other thoughts?
at this point I'm looking at abandoning GHA or writing my own simpler out-of-tree module for github-runners.
I have spent an insane amount of time on this and absolutely nothing has worked.
I'm not convinced this actually works.
Like, I plainly don't see how your example can possibly work given the symlinking that is done in setupWorkDir. The symlinks aren't right and the runner noticeably complains.
lrwxrwxrwx 1 61239 61239 50 Mar 27 15:39 /var/lib/github-runner/slynux-default/.credentials -> /var/lib/github-runner/slynux-default/.credentials
If anyone else ends up here, I've hacked together something that works for me: https://github.com/colemickens/nixos-github-actions/
Golly I wish this module worked:
Failed to set up mount namespacing: /var/lib/private/github-runner/rock5b-default/.current-token: No such file or directory
leaving a quick note for anyone bump into this problem under nixos-24.05: checkout@v3
seems have trouble with directory but upgrading to v4 seems to be fixing the problem.
Describe the bug
I'm not sure if I'm using it wrong, but it doesn't really seem like
workDir
is ... working.I have to pre-create the directories, and then when the service runs, it doesn't have permissions:
This is on nixos-unstable after the recent github-runner cleanup PR was merged. I was using my custom hacked thing, so I don't know if this is a regression or not.
I'm also not sure how to manually fix this given the usage of DynamicUser. Maybe some StartExecPre or tmpfiles.d magic that is missing that should be ensuring the dir is there?
see: https://github.com/NixOS/nixpkgs/pull/284814
cc: @veehaitch