docker / machine

Machine management for a container-centric world
https://docs.docker.com/machine/
Apache License 2.0
6.63k stars 1.97k forks source link

exit code 1 with "open : no such file or directory" when creating machine with driver generic #3202

Open ghost opened 8 years ago

ghost commented 8 years ago

Description of problem: docker-machine exits with return code 1 when creating a machine with driver generic (when no swarm options are added to the command line). --debug reveals a message "open : no such file or directory" which looks like docker-machine trying to reference an undefined variable for a file to read.

Environment details (AWS, VirtualBox, physical, etc.): Vagrant (with Libvirt/KVM), docker-machine generic driver.

How reproducible: Always

Steps to Reproduce: A little insight in my development project:

I use a Makefile to make actions reproducable. Make has the following variables configured:

Relevant Make targets:

vagrant1:
        vagrant up box1

machine1: vagrant1
        docker-machine \
          --storage-path $(MACHINE_STORAGE_PATH) \
          --tls-ca-cert $(MACHINE_CA_PATH)/ca.pem \
          --tls-ca-key $(MACHINE_CA_PATH)/ca-key.pem \
          --tls-client-cert $(MACHINE_CA_PATH)/cert.pem \
          --tls-client-key $(MACHINE_CA_PATH)/key.pem \
          create \
          --driver generic \
          --generic-ip-address 10.20.30.40 \
          --generic-ssh-port 22 \
          --generic-ssh-user vagrant \
          --generic-ssh-key $(CURDIR)/../../.vagrant/machines/box1/$(PROVIDER)/private_key \
          --engine-label foo.example.com.swarm=master \
          --engine-label foo.example.com.swarm=agent \
          --engine-label foo.example.com.vagrant=true \
          box1
# calling make
make -C bootstrap/vagrant PROVIDER=libvirt machine1
# TLS
ls -l bootstrap/vagrant/ca/ 
total 16
-rw------- 1 myuser users 1679 Mar 11 14:40 ca-key.pem
-rw------- 1 myuser users 1034 Mar 11 14:40 ca.pem
-rw------- 1 myuser users 1074 Mar 11 14:40 cert.pem
-rw------- 1 myuser users 1675 Mar 11 14:40 key.pem

Actual Results: docker-machine successfully bootstraps docker daemon on Vagrant VM with IP 10.20.30.40 including TLS authentication and docker-machine -s bootstrap/vagrant/.docker/machine ls functions properly. Anyway docker-machine returns exit code 1 with the following error messages (using --debug):

Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "10.20.30.40:2376": tls: DialWithDialer timed out
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

open : no such file or directory
notifying bugsnag: [Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "10.20.30.40:2376": tls: DialWithDialer timed out
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
# docker-machine -s bootstrap/vagrant/.docker/machine/ ls 
NAME                     ACTIVE   DRIVER    STATE     URL                      SWARM   DOCKER    ERRORS
box1                      *           generic    Running  tcp://10.20.30.40:2376           v1.10.3   
eval $( docker-machine -s bootstrap/vagrant/.docker/machine env box1 )
docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-253:0-525616-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 11.8 MB
 Data Space Total: 107.4 GB
 Data Space Available: 38.93 GB
 Metadata Space Used: 581.6 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 992.8 MiB
Name: box1
ID: DX47:L243:7COB:EYM6:BRAF:6PTV:AJA3:EZ3W:VYCS:B73Z:T2HB:B6KA
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Labels:
 foo.example.com.swarm=master
 foo.example.com.swarm=agent
 foo.example.com.vagrant=true
 provider=generic

Expected Results: docker-machine does not return 1 (error) when successfully bootstrapping docker daemon on a remote host using driver generic.

Additional info: This does not fail for me if I add swarm options to the bootstrapping process. Same scenario, make target machine1 looks like this:

machine1: vagrant1
        docker-machine \
          --debug \
          --storage-path $(MACHINE_STORAGE_PATH) \
          --tls-ca-cert $(MACHINE_CA_PATH)/ca.pem \
          --tls-ca-key $(MACHINE_CA_PATH)/ca-key.pem \
          --tls-client-cert $(MACHINE_CA_PATH)/cert.pem \
          --tls-client-key $(MACHINE_CA_PATH)/key.pem \
          create \
          --driver generic \
          --generic-ip-address 10.20.30.40 \
          --generic-ssh-port 22 \
          --generic-ssh-user vagrant \
          --generic-ssh-key $(CURDIR)/../../.vagrant/machines/box1/$(PROVIDER)/private_key \
          --engine-label foo.example.com.swarm=master \
          --engine-label foo.example.com.swarm=agent \
          --engine-label foo.example.com.vagrant=true \
          --swarm \
          --swarm-image swarm:1.1.3 \
          --swarm-master \
          --swarm-host tcp://0.0.0.0:3376 \
          --swarm-discovery consul://10.20.30.40:8500 \
          --swarm-strategy spread \
          --engine-opt cluster-store=consul://10.20.30.40:8500 \
          box1
StephaneSalema commented 8 years ago

Experiencing the same issue here with a significantly simpler setup.

philliproso commented 7 years ago

`docker-machine --debug create xxxx --driver generic --generic-ip-address ec2-xxxxxxxx.us-west-2.compute.amazonaws.com --generic-ssh-user core --generic-ssh-key key.pem```

Getting the same with this simple command.