aws / aws-codebuild-docker-images

Official AWS CodeBuild repository for managed Docker images http://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref.html
Other
1.12k stars 978 forks source link

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? #164

Closed pgdesigning closed 5 years ago

pgdesigning commented 5 years ago

I'm building an image from https://github.com/aws/aws-codebuild-docker-images/tree/master/ubuntu/docker/17.09.0 and publishing to ECR, and use it for my code build project. I get the following error

image

Any Idea?

The same issue with docker daemon when I execute the image in local machine

subinataws commented 5 years ago

For your CodeBuild project, select the privileged mode option. You'll find this option under "edit"> "environment">"override image"> "Privileged".

You'll also need to start the dockerd inside your container. Suggested commands are shown in "install" phase of the buildspec in https://docs.aws.amazon.com/codebuild/latest/userguide/sample-docker-custom-image.html#sample-docker-custom-image-files.

Hope that helps.

pkania commented 5 years ago

These instructions are contradicted in https://docs.aws.amazon.com/codebuild/latest/APIReference/API_ProjectEnvironment.html

privilegedMode Enables running the Docker daemon inside a Docker container. Set to true only if the build project is be used to build Docker images, and the specified build environment image is not provided by AWS CodeBuild with Docker support. Otherwise, all associated builds that attempt to interact with the Docker daemon fail. You must also start the Docker daemon so that builds can interact with it. One way to do this is to initialize the Docker daemon during the install phase of your build spec by running the following build commands. (Do not run these commands if the specified build environment image is provided by AWS CodeBuild with Docker support.)

There is some clarification here https://github.com/aws/aws-codebuild-docker-images/issues/206

subinataws commented 5 years ago

@pkania - Thanks for reporting that. We'll get the documentation fixed.

johnkoehn commented 5 years ago

@subinataws If we could get the documentation updated on this at some point, that'd be helpful :)

subinataws commented 5 years ago

Done now. https://docs.aws.amazon.com/codebuild/latest/userguide/troubleshooting.html#troubleshooting-cannot-connect-to-docker-daemon.

johnkoehn commented 5 years ago

Thanks @subinataws :)

AlexCromer commented 5 years ago

So I've attempted both setting environment to run as privileged as well as embedding the docker daemon commands into my build spec install phase. However, I'm still getting the issue of "is the daemon running?" Is anyone able to assist on this?

dimazyuwono commented 5 years ago

hi @AlexCromer in case you still didn't solve the issue. Just make sure your are using the version 2 of the Buildspec https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html#build-spec-ref-versions

because if you're still using a buildspec version 1, each command in the buildspec will run in a separate shell session.

lizrice commented 5 years ago

I'm also hitting the same problem, despite using v0.2 buildspec and running as privileged. My hypothesis is that the entrypoint and/or command params get different treatment when the managed version is used. Here's how I reached that conclusion.

Here's how I built the image that I'm trying to use:

$ git clone git@github.com:aws/aws-codebuild-docker-images.git
$ cd ubuntu/standard/2.0/
$ docker build -t <id>.dkr.ecr.us-east-1.amazonaws.com/aws-codebuild-ubuntu-standard-2.0 .
$ docker push <id>.dkr.ecr.us-east-1.amazonaws.com/aws-codebuild-ubuntu-standard-2.0:latest

If you run docker inspect on this image, you'll find the entrypoint is ["dockerd-entrypoint.sh"] (as you would expect from the Dockerfile).

Here's the output I get in the CodeBuild logs - note that I added a ps -eaf in my buildspec to see what's going on:

[Container] 2019/10/14 18:19:24 Waiting for agent ping 
[Container] 2019/10/14 18:19:26 Waiting for DOWNLOAD_SOURCE 
[Container] 2019/10/14 18:19:30 Phase is DOWNLOAD_SOURCE 
[Container] 2019/10/14 18:19:30 CODEBUILD_SRC_DIR=/codebuild/output/src535620508/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/trivy-scan 
[Container] 2019/10/14 18:19:30 YAML location is /codebuild/output/src535620508/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/trivy-scan/testbuildspec.yml 
[Container] 2019/10/14 18:19:30 No commands found for phase name: INSTALL 
[Container] 2019/10/14 18:19:30 Processing environment variables 
[Container] 2019/10/14 18:19:30 Moving to directory /codebuild/output/src535620508/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/trivy-scan 
[Container] 2019/10/14 18:19:30 Registering with agent 
[Container] 2019/10/14 18:19:30 Phases found in YAML: 3 
[Container] 2019/10/14 18:19:30  INSTALL: 0 commands 
[Container] 2019/10/14 18:19:30  PRE_BUILD: 6 commands 
[Container] 2019/10/14 18:19:30  BUILD: 4 commands 
[Container] 2019/10/14 18:19:30 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED 
[Container] 2019/10/14 18:19:30 Phase context status code:  Message:  
[Container] 2019/10/14 18:19:30 Entering phase INSTALL 
[Container] 2019/10/14 18:19:30 Running command echo "Installing Docker version 18 ..." 
Installing Docker version 18 ... 

[Container] 2019/10/14 18:19:30 Phase complete: INSTALL State: SUCCEEDED 
[Container] 2019/10/14 18:19:30 Phase context status code:  Message:  
[Container] 2019/10/14 18:19:30 Entering phase PRE_BUILD 
[Container] 2019/10/14 18:19:30 Running command echo Logging into ECR... 
Logging into ECR... 

[Container] 2019/10/14 18:19:30 Running command $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email) 
WARNING! Using --password via the CLI is insecure. Use --password-stdin. 
WARNING! Your password will be stored unencrypted in /root/.docker/config.json. 
Configure a credential helper to remove this warning. See 
https://docs.docker.com/engine/reference/commandline/login/#credentials-store 

Login Succeeded 

[Container] 2019/10/14 18:19:33 Running command REPOSITORY_URI=<redacted>.dkr.ecr.us-east-1.amazonaws.com/alpine 

[Container] 2019/10/14 18:19:33 Running command ps -eaf 
UID        PID  PPID  C STIME TTY          TIME CMD 
root         1     0  0 18:19 ?        00:00:00 sh -c /codebuild/bootstrap/linux-bootstrap -zipName="linux-binaries.zip" -url="https://codefactory-us-east-1-prod-default-build-agent-executor.s3.amazonaws.com/linux-binaries.zip"  
root        10     1  5 18:19 ?        00:00:00 /codebuild/bootstrap/linux-bootstrap -zipName=linux-binaries.zip -url=https://codefactory-us-east-1-prod-default-build-agent-executor.s3.amazonaws.com/linux-binaries.zip 
root        18    10  0 18:19 ?        00:00:00 /bin/sh -c ./start 
root        19    18  0 18:19 ?        00:00:00 /bin/sh ./start 
root        28    19  0 18:19 ?        00:00:00 /codebuild/readonly/bin/executor 
root        29    19  0 18:19 ?        00:00:00 tee /codebuild/readonly/executor-log 
root        30    19 17 18:19 ?        00:00:01 ./agent -port=7831 
root        64    28  0 18:19 ?        00:00:00 /bin/sh /codebuild/output/tmp/script.sh 
root        66    64  0 18:19 ?        00:00:00 ps -eaf 

[Container] 2019/10/14 18:19:33 Running command whoami 
root 

[Container] 2019/10/14 18:19:33 Running command docker version 
Client: Docker Engine - Community 
 Version:           18.09.6 
 API version:       1.39 
 Go version:        go1.10.8 
 Git commit:        481bc77 
 Built:             Sat May  4 02:33:34 2019 
 OS/Arch:           linux/amd64 
 Experimental:      false 
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? 

[Container] 2019/10/14 18:19:33 Command did not exit successfully docker version exit status 1 
[Container] 2019/10/14 18:19:33 Phase complete: PRE_BUILD State: FAILED 
[Container] 2019/10/14 18:19:33 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker version. Reason: exit status 1 

No sign of the dockerd-entrypoint.sh entrypoint. Of course, it could have terminated by the time the ps gets to run, but this is suspiciously different in the managed version. If I run with the same buildspec but use the managed version of the ubuntu standard 2.0 image, it works beautifully - here are the logs I get:

[Container] 2019/10/14 18:22:30 Waiting for agent ping 
[Container] 2019/10/14 18:22:32 Waiting for DOWNLOAD_SOURCE 
[Container] 2019/10/14 18:22:35 Phase is DOWNLOAD_SOURCE 
[Container] 2019/10/14 18:22:35 CODEBUILD_SRC_DIR=/codebuild/output/src774085032/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/trivy-scan 
[Container] 2019/10/14 18:22:35 YAML location is /codebuild/output/src774085032/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/trivy-scan/testbuildspec.yml 
[Container] 2019/10/14 18:22:35 No commands found for phase name: INSTALL 
[Container] 2019/10/14 18:22:35 Processing environment variables 
[Container] 2019/10/14 18:22:35 Moving to directory /codebuild/output/src774085032/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/trivy-scan 
[Container] 2019/10/14 18:22:35 Registering with agent 
[Container] 2019/10/14 18:22:35 Phases found in YAML: 3 
[Container] 2019/10/14 18:22:35  PRE_BUILD: 6 commands 
[Container] 2019/10/14 18:22:35  BUILD: 4 commands 
[Container] 2019/10/14 18:22:35  INSTALL: 0 commands 
[Container] 2019/10/14 18:22:35 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED 
[Container] 2019/10/14 18:22:35 Phase context status code:  Message:  
[Container] 2019/10/14 18:22:35 Entering phase INSTALL 
[Container] 2019/10/14 18:22:35 Running command echo "Installing Docker version 18 ..." 
Installing Docker version 18 ... 

[Container] 2019/10/14 18:22:35 Phase complete: INSTALL State: SUCCEEDED 
[Container] 2019/10/14 18:22:35 Phase context status code:  Message:  
[Container] 2019/10/14 18:22:35 Entering phase PRE_BUILD 
[Container] 2019/10/14 18:22:35 Running command echo Logging into ECR... 
Logging into ECR... 

[Container] 2019/10/14 18:22:35 Running command $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email) 
WARNING! Using --password via the CLI is insecure. Use --password-stdin. 
WARNING! Your password will be stored unencrypted in /root/.docker/config.json. 
Configure a credential helper to remove this warning. See 
https://docs.docker.com/engine/reference/commandline/login/#credentials-store 

Login Succeeded 

[Container] 2019/10/14 18:22:40 Running command REPOSITORY_URI=<redacted>.dkr.ecr.us-east-1.amazonaws.com/alpine 

[Container] 2019/10/14 18:22:40 Running command ps -eaf 
UID        PID  PPID  C STIME TTY          TIME CMD 
root         1     0  0 18:22 ?        00:00:00 /bin/sh /usr/local/bin/dockerd-entrypoint.sh /codebuild/bootstrap/linux-bootstrap -zipName="linux-binaries.zip" -url="https://codefactory-us-east-1-prod-default-build-agent-executor.s3.amazonaws.com/linux-binaries.zip"  
root        10     1  1 18:22 ?        00:00:00 /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 
root        25    10  1 18:22 ?        00:00:00 containerd --config /var/run/docker/containerd/containerd.toml --log-level info 
root       127     1  4 18:22 ?        00:00:00 /codebuild/bootstrap/linux-bootstrap -zipName=linux-binaries.zip -url=https://codefactory-us-east-1-prod-default-build-agent-executor.s3.amazonaws.com/linux-binaries.zip 
root       134   127  0 18:22 ?        00:00:00 /bin/sh -c ./start 
root       135   134  0 18:22 ?        00:00:00 /bin/sh ./start 
root       144   135  0 18:22 ?        00:00:00 /codebuild/readonly/bin/executor 
root       145   135  0 18:22 ?        00:00:00 tee /codebuild/readonly/executor-log 
root       146   135 12 18:22 ?        00:00:01 ./agent -port=7831 
root       187   144  0 18:22 ?        00:00:00 /bin/sh /codebuild/output/tmp/script.sh 
root       189   187  0 18:22 ?        00:00:00 ps -eaf 

[Container] 2019/10/14 18:22:40 Running command whoami 
root 

[Container] 2019/10/14 18:22:40 Running command docker version 
Client: Docker Engine - Community 
 Version:           18.09.6 
 API version:       1.39 
 Go version:        go1.10.8 
 Git commit:        481bc77 
 Built:             Sat May  4 02:33:34 2019 
 OS/Arch:           linux/amd64 
 Experimental:      false 

Server: Docker Engine - Community 
 Engine: 
  Version:          18.09.6 
  API version:      1.39 (minimum version 1.12) 
  Go version:       go1.10.8 
  Git commit:       481bc77 
  Built:            Sat May  4 02:41:08 2019 
  OS/Arch:          linux/amd64 
  Experimental:     false 

[Container] 2019/10/14 18:22:40 Phase complete: PRE_BUILD State: SUCCEEDED 
[Container] 2019/10/14 18:22:40 Phase context status code:  Message:  
[Container] 2019/10/14 18:22:40 Entering phase BUILD 
[Container] 2019/10/14 18:22:40 Running command docker build -t $REPOSITORY_URI:success . 
Sending build context to Docker daemon  6.656kB 

Step 1/1 : FROM alpine:3.10.2 
3.10.2: Pulling from library/alpine 
9d48c3bd43c5: Pulling fs layer 
9d48c3bd43c5: Verifying Checksum 
9d48c3bd43c5: Pull complete 
Digest: sha256:72c42ed48c3a2db31b7dafe17d275b634664a708d901ec9fd57b1529280f01fb 
Status: Downloaded newer image for alpine:3.10.2 
 ---> 961769676411 
Successfully built 961769676411 
Successfully tagged 593989669228.dkr.ecr.us-east-1.amazonaws.com/alpine:success 
...
...carries on successfully  

Note that with the managed version, the following command is running:

/bin/sh /usr/local/bin/dockerd-entrypoint.sh /codebuild/bootstrap/linux-bootstrap -zipName="linux-binaries.zip" -url="https://codefactory-us-east-1-prod-default-build-agent-executor.s3.amazonaws.com/linux-binaries.zip"  

which suggests that something is different with the entrypoint and/or command that get executed in these two cases, right?

josephvusich commented 5 years ago

@lizrice,

You're right, there are some differences between CodeBuild's curated image execution and that of custom-built images. We're aware that this can be confusing when trying to replicate the ubuntu-standard behavior in a custom-built image, and are working on reducing the complexity around this part of the system.

ayqazi commented 4 years ago

@josephvusich Thank you for acknowledging that we need to do something different in our custom images. But could you at least tell us WHAT needs to be different in custom images to make it work? I don't think there's anything about that in the docs (which are very very old and make no reference to basing your custom CodeBuild images on standard managed images).

hlarsen commented 4 years ago

any official updates on this, or has anyone had any luck figuring it out?

we're trying to speed up some builds with a custom image based on debian:buster, however we're hitting the 'Cannot connect to the Docker daemon' wall. i'd prefer not to have to use the dind images as those are based on alpine.

smiklos commented 4 years ago

Looking at the commit from above, I can confirm that adding this snippet to the buildspec works using a custom image

phases:
  install:
    commands:
      - nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &
      - timeout 15 sh -c "until docker info; do echo .; sleep 1; done"
ghost commented 3 years ago

This is still an issue with standard v5, however I copied the file from https://github.com/aws/aws-codebuild-docker-images/blob/master/ubuntu/standard/5.0/dockerd-entrypoint.sh and run it as my first line within buildspec.yml, and works wonders, very clean.

schlomo commented 3 years ago

I tried to manually start the docker daemon in a privileged ubuntu-standard 5.0 and got this error:

[Container] 2021/02/05 11:06:40 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED
21 | [Container] 2021/02/05 11:06:40 Phase context status code:  Message:
22 | [Container] 2021/02/05 11:06:40 Entering phase INSTALL
23 | [Container] 2021/02/05 11:06:40 Running command nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &
24 |  
25 | [Container] 2021/02/05 11:06:40 Running command timeout 15 sh -c "until docker info; do echo .; sleep 1; done"
26 | time="2021-02-05T11:06:40.862023585Z" level=info msg="Starting up"
27 | failed to start daemon: pid file found, ensure docker is not running or delete /var/run/docker.pid
28 | Client:
29 | Debug Mode: false
30 |  
31 | Server:
32 | Containers: 0
33 | Running: 0
34 | Paused: 0
35 | Stopped: 0
36

So it seems like enabling privileged is now sufficient

pvnick commented 3 years ago

Looks like you can use the dockerd-entrypoint.sh file directly:

phases:
  install:
    commands:
      - echo Starting the Docker daemon...
      - /usr/local/bin/dockerd-entrypoint.sh
brignolij commented 3 years ago

Hi , I fixed it by enabling the following privileged Codebuild env settings. codebuild

sebandgo commented 3 years ago

If anyone is using CloudFormation then a solution - that I've used - is to add the PrivilegedMode flag:

  CodeBuildProject:
    Type: ...
    Properties:
      Name: ...
      ServiceRole: ...
      Artifacts:
        Type: ...
      Environment:
        Type: ...
        ComputeType: ...
        Image: ...
        PrivilegedMode: true # Required to build Docker images

Hope that helps someone.

fagiani commented 3 years ago

It would be amazing if the custom images could just trigger the default ENTRYPOINT defined on the Dockerfile as it seems to be with the managed ones.

BTW, if you copy dockerd-entrypoint.sh from a standard image, make sure you set the execution permissions to avoid getting a Permission Denied error

david-bergman commented 3 years ago

hi @nriveraonica can you provide a sample buildspec.yml file that worked for you ?

I am running the standard v5 image.

I tried this in my install section

install: runtime-versions: golang: 1.x commands:

but I still get the issue

[Container] 2021/10/01 10:35:43 Entering phase INSTALL -- 24 | [Container] 2021/10/01 10:35:43 Running command /usr/local/bin/dockerd-entrypoint.sh 25 | time="2021-10-01T10:35:43.794239935Z" level=info msg="Starting up" 26 | failed to start daemon: pid file found, ensure docker is not running or delete /var/run/docker.pid
cklingspor commented 2 years ago

Looks like you can use the dockerd-entrypoint.sh file directly:

phases:
  install:
    commands:
      - echo Starting the Docker daemon...
      - /usr/local/bin/dockerd-entrypoint.sh

This did it for me. Wondering why the entrypoint is not triggered automatically as for curated images...

ryan-alley commented 2 years ago

Hi , I fixed it by enabling the following privileged Codebuild env settings. codebuild

Unfortunately, if you've created your CodeBuild from CodePipeline, it will also fail:

a

It's really a shame these supposedly complementary products don't seem to mesh very well. :(

ryan-alley commented 2 years ago

Hi , I fixed it by enabling the following privileged Codebuild env settings. codebuild

Unfortunately, if you've created your CodeBuild from CodePipeline, it will also fail:

a

It's really a shame these supposedly complementary products don't seem to mesh very well. :(

Also - you can get this to work by deleting the role and letting it re-create it, but if other things have changed I assume it can cause issues. :/

kirubasunder commented 2 years ago

Hi, can anyone please post the working buildspec config for reference? I created the codePipeline with the following configuration environment: { buildImage: LinuxBuildImage.STANDARD_5_0, privileged: true, }

And in my buildspec phases: install: runtime-versions: nodejs: 14 commands: - echo Starting the Docker daemon... - /usr/local/bin/dockerd-entrypoint.sh pre_build: commands: - yarn install

But, am still facing the following issue while build - failed to start daemon: pid file found, ensure docker is not running or delete /var/run/docker.pid

sam-6174 commented 11 months ago

Not sure why, but I had to call dockerd-entrypoint.sh despite using privileged_mode and the codebuild standard image.


aws_codebuild_project

environment {
  compute_type    = "BUILD_GENERAL1_SMALL"
  image           = "public.ecr.aws/codebuild/amazonlinux2-x86_64-standard:5.0"
  type            = "LINUX_CONTAINER"
  privileged_mode = true
}

buildspec.yml

version: 0.2
phases:
  install:
    commands:
      - /usr/local/bin/dockerd-entrypoint.sh
jean-simon-barry1 commented 7 months ago

I'm utterly confused

Using docker:dind image, with priviledged mode on in AWS CodeBuild, results in

[Container] 2024/03/21 20:56:14.612833 Running command docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

However running it with WITHOUT priviledged mode, I AM able to build docker images, which is not what the check box says.

Privileged Enable this flag if you want to build Docker images or want your builds to get elevated privileges.

The buildspec is simply

version: 0.2

phases:
  build:
    commands:
       - touch Dockerfile
       - echo "FROM ubuntu:latest" > Dockerfile
       - echo "RUN apt-get update && apt-get install -y nginx" >> Dockerfile
       - echo "EXPOSE 80" >> Dockerfile
       - echo "ENV NAME World" >> Dockerfile
       - echo 'CMD ["nginx", "-g", "daemon off;"]' >> Dockerfile
       - docker buildx create --use
       - docker buildx build -t pewpew:pewpew .

Any idea what's wrong here?

subinataws commented 7 months ago

@jean-simon-barry1 - There are two ways to build docker images in AWS CodeBuild. 1) By default the docker.sock is now mounted into the build container. In this case, you don't need to enable the privileged mode to run docker build commands. Note that this functionality is only available when the project does not have a VPC configuration enabled or has the privileged mode selected. 2) By enabling the privileged mode. For most cases, option 1 is the best approach and would be faster to provision (by 7-8 secs) in CodeBuild. If privileged mode is enabled, the docker build is made possible through docker in docker. It also expects you to run a couple of pre-requisite commands to start the Docker daemon inside your build container. You can find the details on these commands here: https://docs.aws.amazon.com/codebuild/latest/userguide/troubleshooting.html#troubleshooting-cannot-connect-to-docker-daemon

Your buildspec with privileged mode enabled would look like:

version: 0.2

phases:
  install:
    commands:
      - nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &
      - timeout 15 sh -c "until docker info; do echo .; sleep 1; done"
  build:
    commands:
       - touch Dockerfile
       - echo "FROM ubuntu:latest" > Dockerfile
       - echo "RUN apt-get update && apt-get install -y nginx" >> Dockerfile
       - echo "EXPOSE 80" >> Dockerfile
       - echo "ENV NAME World" >> Dockerfile
       - echo 'CMD ["nginx", "-g", "daemon off;"]' >> Dockerfile
       - docker buildx create --use
       - docker buildx build -t pewpew:pewpew .