Closed filipesilva closed 6 years ago
Sorry you’re having trouble there! Do the two agents have different names? Because each agent should have its own checkout dir, but it’s based on having unique agent names (which is why %n
is included in the default name). As far as I know, the volume mounts in this plugin shouldn’t mess it up, as long as those agent names are unique.
It looks like this one was gce-buildkite-windows-1-1
. Do you know what the second agent’s checkout/build directory and name was?
Heya @toolmantim, thanks for getting back to me!
It's not different agents though, it's the same agent. I have 1 agent, running on 1 host, and am pushing builds to 1 branch on github.
When I trigger a build via a commit it will checkout in C:\buildkite-agent\builds\gce-buildkite-windows-1-1\angular\angular
then proceed to use that folder as a volume in Docker.
Then, if I trigger another build while the first one is still running, the same happens. Since both builds are using the same folder, they are sharing the files. While the first build is running, its files will be updated with the contents of the second checkout.
That this happens actually sounds a bit odd to me. I can't imagine I am the only person running concurrent builds on the same agent. I wonder if I'm doing something wrong here.
Now that I think about it... is the same agent ever supposed to run concurrent builds? I was looking at it from the perspective of docker, and of the isolation provided by it.
But if the agent is supposed to be the unit of isolation, then I should only have concurrent builds by having multiple agents in the same machine. Is that how they are supposed to be used?
Ah, thanks for the details!
Yep, an agent can only run 1 job at a time, so that situation you’re describing should never happen. Sorry I didn’t make that clearer.
But if you are getting an unexpected error, we can look into it! Did you want to email the details of the builds to support@buildkite.com?
I should mention that agents are cheap, resource wise, and are designed to be run alongside one another. So if you wanna spin up multiple per host to increase concurrency, no problems there.
I saw a bunch of errors like this on build start:
> cd C:\buildkite-agent\builds\gce-buildkite-windows-1-1\angular\angular
--
| > git remote set-url origin https://github.com/angular/angular
| > git clean -fxdq
| warning: failed to remove node_modules/@angular-devkit/core/node: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/core/node_modules/expand-brackets/node_modules: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/core/node_modules/extglob/node_modules: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/core/node_modules/glob-parent/node_modules: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/core/node_modules/is-accessor-descriptor/node_modules: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/core/node_modules/is-data-descriptor/node_modules: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/core/src: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/schematics/src: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/schematics/tasks/tslint-fix: Directory not empty
| warning: failed to remove node_modules/@angular-devkit/schematics/tools: Directory not empty
| warning: failed to remove node_modules/@bazel/bazel/node_modules: Directory not empty
| warning: failed to remove node_modules/@bazel/bazel-win32_x64/bazel-0.18.0-windows-x86_64.exe: Invalid argument
| warning: failed to remove node_modules/@bazel/ibazel/bin: Directory not empty
At the time I had Cancel Intermediate Builds
and knew this sort of error from local manual executions as something that happens when trying to delete folders that are still in use. The errors cleared on a automatic retry:
# Removing C:\buildkite-agent\builds\gce-buildkite-windows-1-1\angular\angular
--
| ⚠️ Warning: Checkout failed! Error running `C:\git\cmd\git.exe clean -fxdq`: exit status 1 (Attempt 1/3 Retrying in 2s)
| # Creating "C:\buildkite-agent\builds\gce-buildkite-windows-1-1\angular\angular"
But now that I know that a single agent is meant to run only one build at a time, that makes more sense. Windows is finicky with file locks so it's not a huge surprise that it kept them longer than it should.
I think I have no problem now that I understand the model better. In my head the single agent was coordinating several docker image runs so the single folder would be a problem.
Sorry for the noise!
Ahhh, that makes sense. No problems at all! Docker problems (and windows file locks, it turns out) are super tricky to debug.
I wonder if we can improve that git clean behaviour? Timing problems are the worst.
@filipesilva did the git clean
operating retry for you, or did it fail the build? Generally we expect retries to handle things like hanging locks. Visual Studio leaves lots of those too :(
@lox the retry sorted it out, yes!
I'm actually running into the same issue right now.
The "root" of the problem (pun intended) is that the buildkite agent is run on the host operating system with the user id and permissions of the buildkite-agent user when it checks out code.
E.g. on my machine, the buildkite-agent has user:group buildkite-agent:buildkite-agent, or more precisely, 997:996.
However, in the docker environment, the default user:group is root:root, or more precisely, 0:0.
Unfortunately, this means that any files created by the build process in the container will be created with permissions associated with the root account in the host environment. So if there are files left-over by root from a previous build, then they cannot be deleted.
I'm not quite sure how to fix it.
My buildkite agent version is buildkite-agent version 3.22.1, build x
When using this plugin, all the builds for the same repository are checked out in the same directory:
After checkout, the repository root will be mounted on a Docker container using the
--volume
flag, which shares file changes between the container and the host file systems.If there is a build already running, and a second build is triggered, the new commit will be checked out on the same directory. The already running build will have its code changed in mid-build.
At best it will test the wrong code, and at worst it will crash the build or other unexpected behaviour.
Even with
Cancel Intermediate Builds
turned on, this can cause odd behaviour as files can still be locked while the previous build is being cancelled. This is especially noticeable on windows where locked files/directories cannot be deleted.Regardless, multiple running builds should not interfere with each other. It is common to have multiple builds running at any given time (e.g. two PRs opened close together).