Open ccarpita opened 3 years ago
Huh, that does sound awkward. We'll take a look at reproducing it and see how we go.
In the interim, a hook (maybe an in-repo one?) that fixes the permissions prior to the job starting could be a good workaround.
We are constantly encountering this issue on Windows machines where the initial git checkout
process is overzealous and ends up nuking the entire repository after a permissions failure. Permissions on Windows are notoriously annoying and a dangling process (or even an open Explorer window) ends up holding a lock on the folder causing the agent to try deleting the entire repository. We are working with 500+ GB repository and the effect of this is devastating.
pre-checkout
hook help here? There really isn't a case for us to ever delete the repo and we instead just perform a manual intervention (say reboot the machine to free the lock)⚠️ Warning: Checkout failed! exit status 3221225786 (Attempt 1/3 Retrying in 2s)
# Removing C:\builds\builder-1\org\repo
@marwanhilmi oof, sorry about that. Handling huge GitHub repos is particularly challenging and something we've tried to make a pass at making better, and it's on our radar for improving soon.
I'd recommend trying to reduce the size of the repo by performing a shallow clone, there's some more context @sj26 wrote over in https://github.com/buildkite/agent/issues/437#issuecomment-708068425
It also would be worth checking out the git mirrors experiment flags to see if that will help. It can be used to set up a local cache of the git repo so that checkouts can be faster. https://buildkite.com/docs/agent/v3/configuration#git-mirrors-path
Would a flag be accepted? BUILDKITE_EXIT_ON_CHECKOUT_FAILURE=true
https://github.com/stevelacy/agent/commit/4db6f24376ec8f63191ac707747e93f9235e796c
We are using LFS - even a shallow clone will be huge.
🤔 we could discuss it. I think the preferred approach in that situation would be to just override the checkout
hook as @yob suggested so that it doesn't purge the directory, is there a reason why that wouldn't work here?
Agent User:
buildkite-agent
Example Checkout path:/var/lib/buildkite-agent/software
Read-only File:/var/lib/buildkite-agent/software/tmp/file
In our case, a previous build rsyncs to the checkout directory, and writes
tmp/file
. While the owner isbuildkite-agent
, the permissions are set asu+rx
andu-w
, aka the file is "read-only".On the next build will fail to
git clean
, and then it will try to delete the whole repo (and fail), and then try to clone (and fail). However, the user actually owns those files, and buildkite-agent could simply chmod o+w any files that it tries to delete, orrm -f <file>
. Assuming buildkite-agent is using a filesystem interface in golang, it's likely some argument would have to change for it to delete write-only files.