buildkite-plugins / docker-buildkite-plugin

🐳📦 Run any build step in a Docker container
MIT License
113 stars 106 forks source link

Containers on windows don't seem to be running commands #81

Closed filipesilva closed 6 years ago

filipesilva commented 6 years ago

Heya, I'm trying to setup a pipeline to run inside windows containers, and am getting some unexpected results. As far as I can tell, the commands aren't ran at all.

I'm running the build agent on a Windows Server version 1803 Datacenter Core for Containers image on Google Container Engine, on which I installed Git for Windows, NSSM and the build agent, as per https://buildkite.com/docs/agent/v3/windows.

My pipeline.yml looks like this

steps:    
  - label: windows-steps
    command: "echo hello"
    plugins:
      - docker#v2.0.0:
          image: "microsoft/dotnet:latest"
    agents:
      windows: true

And the output log looks like this:

Running plugin github.com/buildkite-plugins/docker-buildkite-plugin#v2.0.0 command hook
Running CMD.EXE /c 'echo hello' in microsoft/dotnet:latest | 2s
  | Microsoft Windows [Version 10.0.17134.345]
  | (c) 2018 Microsoft Corporation. All rights reserved.
  |  
  | C:\workdir>> cd C:\buildkite-agent\builds\gce-buildkite-agent-windows-1-1\angular\testsetup\c\buildkite-agent\builds\gce-buildkite-agent-windows-1-1\angular\testsetup

I thought the command might be running but the output not shown, so I tried something that should error out (command: "not-a-binary"). The build still passed, same message, no error anywhere.

lox commented 6 years ago

Heya 👋🏻Sorry you are running into issues, we've just started adding stronger support for Windows containers in the most recent version, so it's possible there are some bugs. I'll see if I can reproduce this!

filipesilva commented 6 years ago

Thanks for getting back to me @lox! I also tried to run the agent locally on my windows machine just now and got the same results. It seems to just CD into the work directory. Locally I already had Git for Windows, but didn't use NSSM since I had the process running in a console.

Is there any more information that I can provide to help you reproduce it?

filipesilva commented 6 years ago

The docker plugin uses git bash on windows, which uses msys2. msys2 will convert POSIX paths to Win32 paths: http://www.mingw.org/wiki/Posix_path_conversion

Thus /c becomes C:\.

This can be wholly disabled with the MSYS2_ARG_CONV_EXCL="*" variable.

PR submitted in https://github.com/buildkite-plugins/docker-buildkite-plugin/pull/82.

Below is the full investigation and how I got to this conclusion. It took a while.


I actually tried debugging it around a bit. In my buildkite agent folder I found the plugin folder and turned on the debug flag in hooks/command to debug_mode='on'.

This let me see the command that's being run:

Running CMD.EXE /c 'not-a-binary' in microsoft/dotnet:latest | 2s
-- | --
  | $ docker run -i --rm --volume C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup:C:\workdir --workdir C:\workdir microsoft/dotnet:latest CMD.EXE /c not-a-binary
  | Microsoft Windows [Version 10.0.17134.345]
  | (c) 2018 Microsoft Corporation. All rights reserved.
  |  
  | C:\workdir>> cd C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup\c\buildkite-agent\builds\RED-X1C6-1\angular\testsetup

So I tried to run that command locally in docker with windows containers, and got the same result. Which is nice enough because it seems to mean that it's a usage issue.

kamik@RED-X1C6 MINGW64 /c/buildkite-agent/plugins/github-com-buildkite-plugins-docker-buildkite-plugin-v2-0-0 ((v2.0.0))
$ docker run -i --rm --volume "C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup:C:\workdir" --workdir "C:\workdir" microsoft/dotnet:latest CMD.EXE /c not-a-binary
Microsoft Windows [Version 10.0.17134.345]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\workdir>

For the sake of debugging I removed the volume mount and workdir, so I'm testing with just docker run -i --rm microsoft/dotnet:latest CMD.EXE /c not-a-binary.

I don't quite get why -i is used, but it's used in both linux and windows so I assume it's there because it works. Indeed running a similar command on linux containers works as expected:

$ docker run -i --rm circleci/node:10.9.0 bash "-e" "-c" dir
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

$ docker run -i --rm circleci/node:10.9.0 bash "-e" "-c" not-a-binary
bash: not-a-binary: command not found

If I try the same thing with a windows container, it just hangs at the prompt and I have to ctrl+c to exit:

$ docker run -i --rm microsoft/dotnet:latest CMD.EXE /c dir
Microsoft Windows [Version 10.0.17134.345]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\>
# ctrl+c

$ docker run -i --rm microsoft/dotnet:latest CMD.EXE /c not-a-binary
Microsoft Windows [Version 10.0.17134.345]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\>
# ctrl+c

However, if I try it directly in my windows machine, without docker, these commands do what's expected:

C:\>cmd.exe /c dir
 Volume in drive C is RED-15
 Volume Serial Number is B633-9AD8

 Directory of C:\

13/11/2018  09:10    <DIR>          buildkite-agent
10/05/2018  09:20    <DIR>          cygwin64
29/04/2018  16:11    <DIR>          DRIVERS
13/06/2018  10:35    <DIR>          Intel
03/09/2018  12:29    <DIR>          msys64
11/04/2018  23:38    <DIR>          PerfLogs
13/11/2018  09:12    <DIR>          Program Files
31/10/2018  13:26    <DIR>          Program Files (x86)
24/09/2018  13:29    <DIR>          Python27
29/03/2018  16:03    <DIR>          Temp
01/05/2018  14:55    <DIR>          Users
01/11/2018  10:54    <DIR>          Windows
               0 File(s)              0 bytes
              12 Dir(s)  65,283,862,528 bytes free

C:\>cmd.exe /c not-a-binary
'not-a-binary' is not recognized as an internal or external command,
operable program or batch file.

I don't know why running it in docker behaves differently. There seems to be a similar issue in stack overflow though: https://stackoverflow.com/questions/42829450/docker-windows-containers-cmd-command-not-running

I tried to go back to the absolute basics and found https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/quick-start-windows-10, where they list a couple of very basic commands that should work in poweshell, which I can verify:

$ docker run -i microsoft/nanoserver powershell -Command dir

    Directory: C:\

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       11/13/2018  10:39 AM                Program Files
d-----        7/16/2016   1:09 PM                Program Files (x86)
d-r---        10/6/2018   9:32 PM                Users
d-----       11/13/2018  10:39 AM                Windows
-a----       11/20/2016  11:32 AM           1894 License.txt

$ docker run -i microsoft/nanoserver powershell -Command not-a-binary
not-a-binary : The term 'not-a-binary' is not recognized as the name of a
cmdlet, function, script file, or operable program. Check the spelling of the
name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ not-a-binary
+ ~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (not-a-binary:String) [], Comman
   dNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

The equivalent using cmd doesn't really seem to work:

$ docker run -i microsoft/nanoserver cmd /c dir
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:\>
# ctrl+c
$ docker run -i microsoft/nanoserver cmd /c not-a-binary
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:\>
# ctrl+c

So maybe it's a problem with running cmd directly.

According to https://hub.docker.com/r/microsoft/nanoserver/, support requests should be filed in https://social.msdn.microsoft.com/Forums/en-US/home?forum=windowscontainers so I was about to file one there.

But then I noticed I was running all my repro docker commands without --rm. So I went to see how many zillion containers I had accumulated:

$ docker ps -a
CONTAINER ID        IMAGE                  COMMAND                  CREATED              STATUS                      PORTS               NAMES
479b833b60ed        microsoft/nanoserver   "powershell.exe -Com…"   44 seconds ago       Exited (0) 32 seconds ago                       peaceful_lalande
b5ce797ebbc1        microsoft/nanoserver   "powershell.exe -Com…"   About a minute ago   Exited (1) 49 seconds ago                       romantic_newton
3440a593d7cb        microsoft/nanoserver   "cmd.exe C:/ dir"        2 minutes ago        Exited (0) 2 minutes ago                        laughing_banach
42eeda283b78        microsoft/nanoserver   "cmd.exe C:/ not-a-b…"   3 minutes ago        Exited (0) 2 minutes ago                        epic_mcnulty
1adfe7861db8        microsoft/nanoserver   "cmd C:/ not-a-binary"   6 minutes ago        Exited (0) 5 minutes ago                        nifty_jones
c623e37de748        microsoft/nanoserver   "cmd C:/ not-a-binary"   12 minutes ago       Exited (0) 12 minutes ago                       tender_joliot
a1e060366654        microsoft/nanoserver   "cmd C:/ not-a-binary"   13 minutes ago       Exited (0) 13 minutes ago                       infallible_swartz
2801423e2336        microsoft/nanoserver   "cmd C:/ dir"            13 minutes ago       Exited (0) 13 minutes ago                       peaceful_mahavira
1b8797e5c12b        microsoft/nanoserver   "cmd C:/ dir"            14 minutes ago       Exited (0) 14 minutes ago                       upbeat_boyd
bfdb78abf8e3        microsoft/nanoserver   "powershell -Command…"   15 minutes ago       Exited (1) 14 minutes ago                       amazing_murdock
a06eeb3877f6        microsoft/nanoserver   "powershell -Command…"   16 minutes ago       Exited (0) 16 minutes ago                       sharp_poincare
ce213765011a        microsoft/nanoserver   "-i powershell -Comm…"   16 minutes ago       Created                                         ecstatic_bhaskara
1107be221eb8        microsoft/nanoserver   "powershell -Command…"   17 minutes ago       Exited (0) 16 minutes ago                       vigilant_bardeen
5b041d37d52a        microsoft/nanoserver   "powerhsell -Command…"   17 minutes ago       Created                                         clever_proskuriakova
70028610bcd5        microsoft/nanoserver   "cmd C:/ dir"            19 minutes ago       Exited (0) 18 minutes ago                       friendly_ride
1a2172c1dbf4        microsoft/nanoserver   "cmd C:/ dir"            19 minutes ago       Exited (0) 19 minutes ago                       condescending_neumann
5ffef025c550        microsoft/nanoserver   "cmd dir"                19 minutes ago       Exited (0) 19 minutes ago                       affectionate_swartz
8b167a6f2cc2        microsoft/nanoserver   "cmd"                    19 minutes ago       Exited (0) 19 minutes ago                       hardcore_easley

...and noticed the very curious "cmd C:/ dir" command.

I'm familiar with the problem from the main repo that I work on: https://github.com/angular/angular-cli/issues/5606.

The buildkite docker plugin uses gitbash to run commands, and when gitbash sees /something it will convert the path. So /c becomes C:/. Details can be found in http://www.mingw.org/wiki/Posix_path_conversion.

I was doing all my testing inside gitbash, so I went to try in plan cmd on my machine and it does work:

C:\> docker run --rm microsoft/nanoserver cmd.exe /c dir
 Volume in drive C has no label.
 Volume Serial Number is 8E9A-67D4

 Directory of C:\

11/20/2016  11:32 AM             1,894 License.txt
07/16/2016  12:20 PM    <DIR>          Program Files
07/16/2016  12:09 PM    <DIR>          Program Files (x86)
10/06/2018  08:32 PM    <DIR>          Users
11/13/2018  10:58 AM    <DIR>          Windows
               1 File(s)          1,894 bytes
               4 Dir(s)  21,205,905,408 bytes free

So drawing inspiration from the solutions in https://github.com/angular/angular-cli/issues/5606, there are a couple of possible approaches.

Using //c instead (docker run --rm microsoft/nanoserver cmd.exe //c dir) works but what about if the user commands also use /something?

Using MSYS2_ARG_CONV_EXCL seems a better approach, and is detailed in https://github.com/Alexpux/MSYS2-packages/issues/84. It seems to support * as a catch all, and since I don't think we want any paths being converted at all, that seems ideal.

Indeed this works:

$ MSYS2_ARG_CONV_EXCL="*" docker run --rm microsoft/nanoserver cmd.exe /c dir
 Volume in drive C has no label.
 Volume Serial Number is 8E9A-67D4

 Directory of C:\

11/20/2016  11:32 AM             1,894 License.txt
07/16/2016  12:20 PM    <DIR>          Program Files
07/16/2016  12:09 PM    <DIR>          Program Files (x86)
10/06/2018  08:32 PM    <DIR>          Users
11/13/2018  11:14 AM    <DIR>          Windows
               1 File(s)          1,894 bytes
               4 Dir(s)  21,215,952,896 bytes free

Indeed this makes it work:


Running CMD.EXE /c 'not-a-binary' in microsoft/dotnet:latest | 3s
-- | --
  | $ docker run -i --rm --volume C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup:C:\workdir --workdir C:\workdir microsoft/dotnet:latest CMD.EXE /c not-a-binary
  | 'not-a-binary' is not recognized as an internal or external command,
  | operable program or batch file.

Will submit a PR.

filipesilva commented 6 years ago

It's worth mentioning that even with https://github.com/buildkite-plugins/docker-buildkite-plugin/pull/82, multiple commands don't seem to run properly.

Using the config:

steps:      
  - label: windows-steps
    command: 
      - "dir"
      - "not-a-binary"
    plugins:
      - docker#v2.0.0:
          image: "microsoft/dotnet:latest"
    agents:
      windows: true

The build logs this:


Running CMD.EXE /c 'dir | 3s
-- | --
  | not-a-binary' in microsoft/dotnet:latest
  | $ docker run -i --rm --volume C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup:C:\workdir --workdir C:\workdir microsoft/dotnet:latest CMD.EXE /c dir
  | not-a-binary
  | Volume in drive C has no label.
  | Volume Serial Number is 125C-63E0
  |  
  | Directory of C:\workdir
  |  
  | 11/13/2018  09:11 AM    <DIR>          .
  | 11/13/2018  09:11 AM    <DIR>          ..
  | 11/13/2018  09:11 AM               837 .bazelrc
  | 11/13/2018  11:58 AM    <DIR>          .buildkite
  | 11/13/2018  09:11 AM    <DIR>          .circleci
  | 11/13/2018  09:11 AM                73 .clang-format
  | 11/13/2018  09:11 AM                28 .gitignore
  | 11/13/2018  09:11 AM               324 BUILD.bazel
  | 11/13/2018  09:11 AM    <DIR>          e2e
  | 11/13/2018  09:11 AM            22,160 graph.png
  | 11/13/2018  09:11 AM             1,096 LICENSE
  | 11/13/2018  09:11 AM             1,621 package.json
  | 11/13/2018  09:11 AM             1,147 postinstall.tsconfig.json
  | 11/13/2018  09:11 AM             4,007 README.md
  | 11/13/2018  09:11 AM                41 renovate.json
  | 11/13/2018  09:11 AM    <DIR>          src
  | 11/13/2018  09:11 AM             3,425 WORKSPACE
  | 11/13/2018  09:11 AM           177,598 yarn.lock
  | 12 File(s)        212,357 bytes
  | 6 Dir(s)  62,776,291,328 bytes free
  | > cd C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup\c\buildkite-agent\builds\RED-X1C6-1\angular\testsetup
  | # MSYS2_ARG_CONV_EXCL changed

The dir runs, but the not-a-binary (which should fail) doesn't seem to run.

Using command: "dir && not-a-binary" works though. I'm not sure what you intend the semantics of the multiple commands to be... if it's "only run the others after the first successfully completes" then it might be ok to join all of them with && on windows.

I think the last two lines are odd though, but don't know what they mean/imply:

  | > cd C:\buildkite-agent\builds\RED-X1C6-1\angular\testsetup\c\buildkite-agent\builds\RED-X1C6-1\angular\testsetup
  | # MSYS2_ARG_CONV_EXCL changed
filipesilva commented 6 years ago

@lox heya 👋

Have you had time to look at https://github.com/buildkite-plugins/docker-buildkite-plugin/pull/82? I just added a fix for multiple commands on windows too.

I'm a bit worried about the command syntax being a bit brittle... Have you considered starting a container, then running each command individually inside it? The container would have to be removed when running the last command though.

lox commented 6 years ago

Just trying to get my windows environment working again today! 😓

filipesilva commented 6 years ago

Awesome, thank you!

Have you thought about the "running commands individually" idea? To be honest I can't think of a scenario offhand where the current setup would break. Probably something with newlines? It's hard to follow the logs though, when all pipeline commands are ran in a single line.

Under this setup what would happen with docker is:

docker start org/container:latest 
# retrieve the containerid somehow
docker run containerid cmd.exe /C command 1
docker run containerid cmd.exe /C command 2
docker run containerid cmd.exe /C command 3
docker container rm containerid

Same on linux, just with bash instead of cmd.

I don't think changing this is very important right now but might be relevant if the current one lines breaks with some commands.

lox commented 6 years ago

We tend to think of command parameters as shell scripts. It's pretty intrinsic in that contract that they are executed within the same shell context I think. If you created a different container for each it would break things like:

cd dir/
./run_script.sh
chmod +x my_file

It would be theoretically possible to use docker exec to run commands within a container, but you'd still lose shell context. Given your change with && between commands, would there be an upsides to the proposed container-per-command approach?

filipesilva commented 6 years ago

In that setup I'm proposing they are actually executed in the same container.

The container keeps running in the background and is only stopped on container stop containerid.

Here's a demo, with the actual commands. The commands I listed before did not work.

D:\work\angular>docker run -d -it microsoft/nanoserver:1803
3ab21937d628bcf3e4dc4bd41d4dff7a72501e2f55e8771d1c7e75009baaed5c

D:\work\angular>docker container ls
CONTAINER ID        IMAGE                                  COMMAND                    CREATED             STATUS              PORTS               NAMES
3ab21937d628        microsoft/nanoserver:1803              "c:\\windows\\system32…"   14 seconds ago      Up 10 seconds                           keen_stallman
b8d257b2bee2        filipesilva/node-bazel-windows:0.0.2   "cmd /c 'cmd /C C:\\\\…"   20 hours ago        Up 20 hours                             hopeful_ganguly

D:\work\angular>docker exec 3ab21937d628 cmd /c echo hello
hello

D:\work\angular>docker exec 3ab21937d628 cmd /c mkdir persistentDir

D:\work\angular>docker exec 3ab21937d628 cmd /c dir
 Volume in drive C has no label.
 Volume Serial Number is 125C-63E0

 Directory of C:\

04/11/2018  11:53 PM             1,894 License.txt
11/23/2018  04:58 PM    <DIR>          persistentDir
11/23/2018  04:54 PM    <DIR>          Users
11/23/2018  04:54 PM    <DIR>          Windows
               1 File(s)          1,894 bytes
               3 Dir(s)  21,243,764,736 bytes free

D:\work\angular>docker stop 3ab21937d628
3ab21937d628

D:\work\angular>docker container ls
CONTAINER ID        IMAGE                                  COMMAND                    CREATED             STATUS              PORTS               NAMES
b8d257b2bee2        filipesilva/node-bazel-windows:0.0.2   "cmd /c 'cmd /C C:\\\\…"   20 hours ago        Up 20 hours                             hopeful_ganguly

In the example above I made a directory in one docker exec command and it was still there on the other.

lox commented 6 years ago

Ah right, yes you are correct. I missed the lack of --rm!

You'd still miss out on the actual CMD.EXE shell context though.

filipesilva commented 6 years ago

Ah jeez, I missed your comment about exec and the shell context! Yes that would be lost...

At the moment I don't have a real upside besides the readable logs.

If I find a limitation with the current approach I'll look at this proposal again.

lox commented 6 years ago

Sounds good, thanks so much for the PR! Let us know if you run into any other issues!

lox commented 6 years ago

Btw, the other approach that we used historically was to generate a batch script that would execute things with the semantics you'd expect.

https://github.com/buildkite/agent/blob/5bdad78c5e7c75ca4487197a3f84537445d0bccf/bootstrap/hook.go#L97-L108

We still use that in the agent, it's just awkward to use with docker as you've got to get the generated script into the container.