docker / machine

Machine management for a container-centric world
https://docs.docker.com/machine/
Apache License 2.0
6.63k stars 1.97k forks source link

docker-machine create is hanging with driver virtualbox : SSH cmd err, output: exit status 255: #1591

Closed opskumu closed 8 years ago

opskumu commented 9 years ago
# docker -v && docker-machine -v && docker-compose -v
Docker version 1.6.2, build ba1f6c3/1.6.2
docker-machine version 0.3.1 (40ee236)
docker-compose version: 1.3.3
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013
# rpm -qa | grep virtual -i
VirtualBox-4.3-4.3.30_101610_el7-1.x86_64

docker-machine create is hanging,the debug logs:

Getting to WaitForSSH function...
Testing TCP connection to: localhost:45105
Using SSH client type: external
About to run SSH command:
exit 0
&{/usr/bin/ssh [/usr/bin/ssh -o PasswordAuthentication=no -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -i /root/.docker/machine/machines/dev2/id_rsa -p 45105 docker@localhost exit 0] []  <nil> <nil> <nil> [] <nil> <nil> <nil> <nil> false [] [] [] [] <nil>}
SSH cmd err, output: exit status 255: 
Error getting ssh command 'exit 0' : exit status 255
Getting to WaitForSSH function...
Testing TCP connection to: localhost:45105
Using SSH client type: external
About to run SSH command:
exit 0
&{/usr/bin/ssh [/usr/bin/ssh -o PasswordAuthentication=no -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -i /root/.docker/machine/machines/dev2/id_rsa -p 45105 docker@localhost exit 0] []  <nil> <nil> <nil> [] <nil> <nil> <nil> <nil> false [] [] [] [] <nil>}
SSH cmd err, output: exit status 255: 
Error getting ssh command 'exit 0' : exit status 255
... ...
tehmaspc commented 9 years ago

I believe this is the same issue I'm having on MacOSX. Have tried a few times to create a new VirtualBox Dev VM from scratch. Have cleaned out .docker/ and even downgraded from VBOX 5.

% docker-machine --version
docker-machine version 0.3.0 (0a251fe)
% VirtualBox --help | head -n 1
Oracle VM VirtualBox Manager 4.3.30
% docker-machine ls
error getting URL for host dev: exit status 255
NAME   ACTIVE   DRIVER       STATE     URL   SWARM
dev    *        virtualbox   Running

Let me know what specific logs / info you might need; the following is my debug log info:

STDERR:
executing: /usr/bin/VBoxManage modifyvm docker-vm --nic2 hostonly --nictype2 82540EM --hostonlyadapter2 vboxnet2 --cableconnected2 on
STDOUT:
STDERR:
executing: /usr/bin/VBoxManage modifyvm docker-vm --natpf1 delete ssh
STDOUT:
STDERR: VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available)
VBoxManage: error: Context: "RemoveRedirect(Bstr(ValueUnion.psz).raw())" at line 1717 of file VBoxManageModifyVM.cpp
executing: /usr/bin/VBoxManage modifyvm docker-vm --natpf1 ssh,tcp,127.0.0.1,50083,,22
STDOUT:
STDERR:
executing: /usr/bin/VBoxManage startvm docker-vm --type headless
STDOUT: Waiting for VM "docker-vm" to power on...
VM "docker-vm" has been successfully started.
STDERR:
Starting VM...
Getting to WaitForSSH function...
Testing TCP connection to: localhost:50083
Using SSH client type: external
About to run SSH command:
exit 0
&{/usr/bin/ssh [/usr/bin/ssh -o PasswordAuthentication=no -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -i /Users/tehmasp/.docker/machine/machines/docker-vm/id_rsa -p 50083 docker@localhost exit 0] []     []    ?reflect.Value? false [] [] [] [] }
SSH cmd err, output: exit status 255:
Error getting ssh command 'exit 0' : exit status 255
Getting to WaitForSSH function...
Testing TCP connection to: localhost:50083
Using SSH client type: external
About to run SSH command:
exit 0
&{/usr/bin/ssh [/usr/bin/ssh -o PasswordAuthentication=no -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -i /Users/tehmasp/.docker/machine/machines/docker-vm/id_rsa -p 50083 docker@localhost exit 0] []     []    ?reflect.Value? false [] [] [] [] }
 SSH cmd err, output: exit status 255:
Error getting ssh command 'exit 0' : exit status 255
Getting to WaitForSSH function...
Testing TCP connection to: localhost:50083
Using SSH client type: external
About to run SSH command:
exit 0
&{/usr/bin/ssh [/usr/bin/ssh -o PasswordAuthentication=no -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -i /Users/tehmasp/.docker/machine/machines/docker-vm/id_rsa -p 50083 docker@localhost exit 0] []     []    ?reflect.Value? false [] [] [] [] }
SSH cmd err, output: exit status 255:
Error getting ssh command 'exit 0' : exit status 255
Getting to WaitForSSH function...
Testing TCP connection to: localhost:50083
Using SSH client type: external
About to run SSH command:
exit 0
&{/usr/bin/ssh [/usr/bin/ssh -o PasswordAuthentication=no -o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -i /Users/tehmasp/.docker/machine/machines/docker-vm/id_rsa -p 50083 docker@localhost exit 0] []     []    ?reflect.Value? false [] [] [] [] }

I'm able to log into the VM manually, and it seems to be OK - so I think the issue is with docker-machine not being able to get the status correctly.

tehmaspc commented 9 years ago

I got a docker-machine environment working with:

% docker-machine --version
docker-machine version 0.4.0-rc1 (f6ea2c1)

(FYI - Manually installed it since homebrew-cask doesn't have anything newer than v0.3.0 as of yet)

However, on 'create', docker-machine still hung and I had to 'regenerate-certs' for my 'docker-vm' in order to fully get a 'docker-machine env docker-vm' to work.

Putting it out there for anyone else that's having similar issues. At least I have a working docker-machine environment now - without having to revert to boot2docker. Only wasted the whole day on this however :(

opskumu commented 9 years ago

@tehmaspc the same issue with you, And I didn't find a solution

ehazlett commented 9 years ago

@tehmaspc thanks for the feedback and sorry for the trouble :( unfortunately this is usually due to the virtualbox networking. Thanks for the feedback that v0.4.0-rc1 fixed.

@opskumu would you mind trying the 0.4.0-rc1?

wmiller848 commented 9 years ago

I'm seeing the same thing on MacOS X Yosemite 0.4.0-rc1

docker-machine create --driver=virtualbox --virtualbox-disk-size "40000" local
No default boot2docker iso found locally, downloading the latest release...
Downloading https://s3.amazonaws.com/docker-mcn/public/b2d-next/boot2docker-virtualbox.iso to /Users/wmillerx/.docker/machine/cache/boot2docker-virtualbox.iso...
Creating VirtualBox VM...
Creating SSH key...
Starting VirtualBox VM...
Starting VM...

Just hangs forever...

Env Info:

sw_vers
ProductName:    Mac OS X
ProductVersion: 10.10.4
BuildVersion:   14E46
docker --version
Docker version 1.7.1, build 786b29d
docker-machine --version
docker-machine version 0.4.0-rc1 (f6ea2c1)
vboxmanage --version
5.0.0r101573
tehmaspc commented 9 years ago

FWIW, I downgraded to 4.3.x VBOX and I did a regenerate cert command to get past this. Make sure to try that with a fresh VM. Good luck.

tehmaspc commented 9 years ago

@opskumu @wmiller848 - so I just came across this issue: https://github.com/docker/machine/issues/1572

I use SSH multiplexing in my ~/.ssh/config file; I just disabled these settings and I'm getting docker-machine to work properly. Even my workaround above wasn't really working consistently but with SSH multiplexing disabled it seems to have been the fix.

Wondering if you guys have SSH multiplexing enabled???

My ~/.ssh/config file is now:

% cat ~/.ssh/config
Host *
  TCPKeepAlive yes
  ServerAliveInterval 10
  ServerAliveCountMax 10
  ForwardAgent yes
#  ControlMaster auto
#  ControlPath ~/.ssh/sockets/%r@%h-%p
#  ControlPersist 300

Cheers, Tehmasp

powdahound commented 9 years ago

Disabling SSH multiplexing worked for me too (on OS X 10.10.4). Thank you @tehmaspc!

opskumu commented 9 years ago

@ehazlett I have already test the 0.4.0,It's also not work.

# docker-machine -v
docker-machine version 0.4.0 (9d0dc7a)
# cat /etc/centos-release 
CentOS Linux release 7.1.1503 (Core)

@tehmaspc Diablling ssh multiplexing doesn't work for me too.

There are also have a problem on Win10 with docker-machine 0.4.0:

{ ~ }  » docker-machine.0.4.0 ls                                                                                                ~ 1
NAME      ACTIVE   DRIVER       STATE     URL   SWARM
default            virtualbox   Timeout
dev                virtualbox   Timeout
{ ~ }  » docker-machine.0.3.1 ls                                                                                              ~ 127
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM
default            virtualbox   Stopped
dev                virtualbox   Running   tcp://192.168.99.102:2376
stejohnson commented 9 years ago

Not sure if this helps, but following similar problems I noticed the ssh port was not set in my machine's config file (~/.docker/machine/machines/<machine_name>/config.json). Consequently, docker-machine was always trying to ssh on port 22. After setting this manually everything worked for me.

chrisfosterelli commented 9 years ago

For what it's worth, removing the multiplex settings in my SSH config fixed this for me. Nothing else I tried in any of the other open issues was working, but now everything appears to work great.

Perhaps the VM setup script should consider using the SSH -o option to disable multiplexing when it's making connections?

tehmaspc commented 9 years ago

@chrisfosterelli yup +1 ;

cc @ehazlett - loads of people are continuing to have this issue; should we bump this?

thanks, @tehmaspc

garystafford commented 9 years ago

The problem has been so hit and miss. It would be great to crowd source the potential fix with everyone that's part of this discussion to ensue it mitigates the issue.

stgarf commented 9 years ago

Removing ssh mutliplexing fixed this for me as well.

garystafford commented 9 years ago

Has anyone found disabling ssh multiplexing fixed the issue on Linux, as opposed to Mac. I am still seeing issue on Ubuntu with it disabled. People say it fixed it for them, but don't note Linux, Windows, or Mac.

rhim commented 9 years ago

I don't have ssh multiplexing turned on, but I am still seeing this issue. @wmiller848 : did you find a solution to this problem? Here is my environment: ~$docker --version; docker-machine --version; VBoxManage --version Docker version 1.9.0-dev, build 0e3674d, experimental docker-machine version 0.4.1 (e2c88d6) 5.0.0r101573

~$sw_vers ProductName: Mac OS X ProductVersion: 10.10.4 BuildVersion: 14E46

MatthewVance commented 9 years ago

I'm also seeing the following error:

SSH cmd err, output: exit status 255: 
Error getting ssh command 'exit 0' : exit status 255

I used the default Docker Machine install and let it install VirtualBox since I didn't already have it on this particular computer. The only thing potentially odd I have is a homebrew version of OpenSSH with strict crypto requirements. Here's the details of my environment:

sw_vers
ProductName:    Mac OS X
ProductVersion: 10.10.5
BuildVersion:   14F27
docker --version
Docker version 1.8.1, build d12ea79
docker-machine --version
docker-machine version 0.4.1 (e2c88d6)
vboxmanage --version
5.0.2r102096
ssh -V
OpenSSH_7.0p1, OpenSSL 1.0.2d 9 Jul 2015
cat ~/.ssh/config 
#Defaults for all my hosts
Host *
    AddressFamily inet
    Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com
    ForwardX11 no
    ForwardX11Trusted no
    KexAlgorithms curve25519-sha256@libssh.org
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com
    Protocol 2
    VisualHostKey yes  
    HashKnownHosts yes
#host specific stuff..
mglaman commented 9 years ago

Having same issue.

$ vboxmanage --version
5.0.2r102096
$ docker-machine --version
docker-machine version 0.4.1 (e2c88d6)

Not sure why, but my issue is a bad config.json for the machine. It's missing the machine's IP and port.

mshean commented 9 years ago

Also having the same problem... I'm using boot2docker until this is fixed.

vboxmanage --version
4.3.14r95030

docker-machine -version
docker-machine version 0.4.1 (e2c88d6)

docker version
Client:
 Version:      1.8.1
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   d12ea79
 Built:        Thu Aug 13 19:47:52 UTC 2015
 OS/Arch:      darwin/amd64
ncuesta commented 9 years ago

I'm having the same issue here:

$ docker --version; docker-machine --version; VBoxManage --version; sw_vers                                                                                              2.2.2 16:52
Docker version 1.8.1, build d12ea79
docker-machine version 0.4.1 (e2c88d6)
5.0.3r102322
ProductName:    Mac OS X
ProductVersion: 10.10.5
BuildVersion:   14F27

Thanks

tdoherty commented 9 years ago

Same issue here:

$ docker --version; docker-machine --version; VBoxManage --version; sw_vers
Docker version 1.8.1, build d12ea79
docker-machine version 0.4.1 (e2c88d6)
4.3.22r98236
ProductName:    Mac OS X
ProductVersion: 10.10.5
BuildVersion:   14F27

I downgraded VBox from 5.x to 4.3.x and it worked for a few hours, then gave the same SSH error.

ecylmz commented 9 years ago

I have same issue. I have found workaround for this bug:

$ docker-machine --native-ssh create -d virtualbox test
stayclassychicago commented 9 years ago

Thanks @ecylmz. +1 for your workaround. I still saw this error, but it successfully created the machine.

STDERR: executing: /usr/local/bin/VBoxManage modifyvm imc --natpf1 delete ssh

STDERR: VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available) VBoxManage: error: Context: "RemoveRedirect(Bstr(ValueUnion.psz).raw())" at line 1766 of file VBoxManageModifyVM.cpp

ncuesta commented 9 years ago

Kudos to @ecylmz for the workaround, it also worked for me. Only caveat is that I need to run docker-machine --native-ssh for every docker-machine command (like docker-machine env vm_name).

It's also worth noting that upgrading to Docker Toolbox 1.8.1c didn't fix this issue.

francoiskha commented 9 years ago

--native-ssh workaround worked for me too

frankwmoyer commented 9 years ago

--native-ssh worked for me. :+1: @ecylmz

nathanleclaire commented 9 years ago

Just FYI everyone, if the reason you are encountering these issues is because of SSH multiplexing configuration settings, it should be fixed in the next release / on master.

garystafford commented 9 years ago

@nathanleclaire thank you for the update on fixing SSH multiplexing. Good news! I have tested --native-ssh on Linux (Ubuntu). It does not fix the issue. Again, it's hit and miss, so having it work once doesn't mean it anything. I can have it fail 10 times in a row, then suddenly work a few times, then fail 10 times.

phlegx commented 9 years ago

@garystafford I can confirm this --native-ssh does not work having default driver (none) with Ubuntu.

saada commented 9 years ago

+1

wenchma commented 9 years ago

I run $ docker-machine -D --native-ssh create -d virtualbox local command, it did not work with error log: executing: /usr/bin/VBoxManage startvm local --type headless STDOUT: Waiting for VM "local" to power on... VM "local" has been successfully started.

STDERR: Starting VM... Getting to WaitForSSH function... Testing TCP connection to: localhost:49564 Using SSH client type: native About to run SSH command: exit 0 Error dialing TCP: ssh: handshake failed: read tcp 127.0.0.1:49564: connection reset by peer Error dialing TCP: ssh: handshake failed: read tcp 127.0.0.1:49564: connection reset by peer

garystafford commented 9 years ago

@nathanleclaire, I just cloned and built the latest docker-machine on the master branch on GitHub for Ubuntu or Fedora. I am still seeing no improvements on the ssh errors with Ubuntu or Fedora:

gstafford@gstafford-X555LA:$ docker-machine -v
docker-machine version 0.5.0-dev (fe5a722)
garystafford commented 9 years ago

@nathanleclaire I went from 80%+ failures creating machines and/or getting IP address conflicts to 100% success by deleting those extra host-only network adapters you mentioned. On VirtualBox 5.0.3, I went to VirtualBox -> Preferences -> Network -> Host-only Networks, and removed them. After that I created a 5-cluster swarm and added weave with no obvious errors or issues. Thank you. FYI, I am still running the docker-machine version 0.5.0-dev (fe5a722) version I cloned and built.

nathanleclaire commented 9 years ago

@garystafford Good to hear it's been cleaned up for you. I'd definitely like to put more effort into detecting wonky networking configurations and suggesting solutions to save the sort of trouble that you had to go through.

samalb commented 9 years ago

It appears my problem was two fold. Not only was the removal of the adapters vmnet0 and vmnet1 necessary but all docker-machine commands require sudo.

smileyan commented 8 years ago

In my env, this is a virtualbox network issue. I had to (1) Change the Host-only Adapter to Bridged Adapter(en0 Wi-Fi). Then I can ssh docker@... (2) docker-machine regenerate-certs Then 'docker-machine config' works

djaed commented 8 years ago

been getting exact same error, but managed to solve my case by "enable virtualization" in bios.

troubleshoot involved:

  1. using debug flag: docker-machine --debug create --driver virtualbox dev
  2. check "dev" vm network settings in VirtualBox Manager, assuming it was an adapter / port forwarding issue
  3. by chance I double clicked on the running "dev" vm in VirtualBox Manager, which opens a window into "dev" vm, which showed the actual problem: "vt x amd-v not available on your system". Because of this docker's vm was stuck with error "requires x86-64 but only detected i686 cpu"

Just wanted to share my troubleshoot, especially the part of "double clicking" on the running vm inside VirtualBox Manager. Try this, might give you a hint on what the underlying cause for your hangup may be.

trentm commented 8 years ago

As a workaround, if you still want Host * ... ControlMaster auto in your "~/.ssh/config", I found I could do this:

# Docker: docker-machine (at least for virtualbox) breaks if ControlMaster
# is used. See:
#     https://github.com/docker/machine/issues/1591#issuecomment-126169020
# This block needs to be before any global "Host *" using ControlMaster.
Host localhost
    ControlMaster no

Host *
    ControlMaster auto
    ControlPath ~/.ssh/socket-%r@%h:%p
    ControlPersist yes

# ...
nathanleclaire commented 8 years ago

The latest RCs should work fine with ControlMaster options set in SSH config

joeylin commented 8 years ago

docker-machine -v docker-machine version 0.4.1 (e2c88d6)

I also have the same issue, the workaround is add --native-ssh to every docker-machine command

olmeca commented 8 years ago

I also have this issue on MacOS 10.11.1. Using --native-ssh helps indeed.

ragavendra commented 8 years ago

I had corrupt ~/.ssh/config . Once I corrected it, I was able to create machine like before. One way to check this is trying to ssh to some other server and see if it works to make sure the docker-machine isn't failing due to ssh.

MilindAtGithub commented 8 years ago

Same issue and nothing worked . For me only the work around solution is to add --native-ssh for each machine.

My Env Details: OS: OS X El Capitan Version 10.11 docker-machine version 0.5.0 (04cfa58) VBoxManage --version 5.0.8r103449

timhtheos commented 8 years ago

@tehmaspc https://github.com/docker/machine/issues/1591#issuecomment-126169020 works for me, with some workarounds suggested by @trentm.

jocull commented 8 years ago

I ran into this on Windows 10 today after a small upgrade in my Docker install. I tried everything - removing .docker, rebuilding the VBox VMs, fiddling with my environment vars, hacking on config.json... In the end I uninstalled everything (including virtualbox) and let it all reinstall. That worked!

I seemed like there must be an issue with the VBox networks that happens during the upgrade process or something. Like the expected IP address was taken and it all blew up after that.

nadworny commented 8 years ago

I had the same issue. Apparently I had an OpenSSH installed before and it was taking it while creating the docker image which led to the ControlMaster error. I uninstalled it, added Git\bin to the path and it worked like a charm.

chrisbenson commented 8 years ago

I'm having this same problem, and the proposed workarounds on this page have not had any effect. Any ideas?

Here is my environment:

Docker version 1.10.3, build 20f81dd docker-machine version 0.6.0, build e27fb87 5.0.16r105871 ProductName: Mac OS X ProductVersion: 10.11.4 BuildVersion: 15E65

When I issue this command:

docker-machine --debug create -d virtualbox default

...it always eventually hangs on this error, which is printed to the terminal repeatedly until I manually interrupt it:

(default) DBG | Getting to WaitForSSH function... (default) DBG | Using SSH client type: external (default) DBG | {[-o BatchMode=yes -o PasswordAuthentication=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none docker@127.0.0.1 -o IdentitiesOnly=yes -i /Users/cbenson/.docker/machine/machines/default/id_rsa -p 53332] /usr/local/bin/ssh} (default) DBG | About to run SSH command: (default) DBG | exit 0 (default) DBG | SSH cmd err, output: exit status 255: (default) DBG | Error getting ssh command 'exit 0' : Something went wrong running an SSH command! (default) DBG | command : exit 0 (default) DBG | err : exit status 255 (default) DBG | output :

chrisbenson commented 8 years ago

@nathanleclaire can you take a look at my comment above? I'm on Mac El Cap, and everything I'm using - Docker, Docker Machine, VirtualBox, OS updates - is the very latest version. Nothing I've seen as potential fixes in this or related issue pages has worked. I think it's something about my configuration, because it affects two similarly-configured Macs I have. I use Homebrew for latest Docker, Docker Machine, OpenSSH, and OpenSSL. I have VirtualBox installed from its own binary, but I've previously tried installing it via Homebrew as well (but not currently). Any ideas? Thanks!

tehmaspc commented 8 years ago

I don't have EL Cap yet so I cannot share my experience w/r/t that OS.

An alternative is to get into the Docker for Mac beta which eliminates Virtualbox requirements altogether. Of course it might have other issues :)

https://blog.docker.com/2016/03/docker-for-mac-windows-beta/

nathanleclaire commented 8 years ago

@chrisbenson What's your ~/.ssh/config file like?

What's the output of docker-machine ssh default -vvv?