AliyunContainerService / docker-machine-driver-aliyunecs

Aliyun (Alibaba Cloud) ECS Driver of Docker Machine
Apache License 2.0
203 stars 49 forks source link

Failed to create aliyunecs instance #8

Closed twang2218 closed 8 years ago

twang2218 commented 8 years ago

I tried to create a aliyunecs instance with the aliyunecs driver but without any success.

I set these environment variables first:

export ECS_ACCESS_KEY_ID=xxxx
export ECS_ACCESS_KEY_SECRET=xxxx
export ECS_REGION=cn-beijing
export ECS_SSH_PASSWORD=xxxx
export ECS_INTERNET_MAX_BANDWIDTH=100
export ECS_SECURITY_GROUP=fully-open
export MACHINE_DOCKER_INSTALL_URL=http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/internet

The security group, fully-open, is just fully open policy, allow any incoming connections.

And then, I run the following command to create the docker host:

docker-machine create -d aliyunecs --engine-registry-mirror https://xxxx.mirror.aliyuncs.com node-1

However, it stalled at Uploading SSH keypair every time. Here is the output:

Running pre-create checks...
Creating machine...
(node-1) node-1 | Creating key pair for instance ...
(node-1) node-1 | Configuring security groups instance ...
(node-1) node-1 | Creating instance with image ubuntu1404_64_20G_aliaegis_20150325.vhd ...
(node-1) node-1 | Create instance i-25d5tjo9b successfully
(node-1) node-1 | Allocate publice IP address 101.200.126.206 for instance i-25d5tjo9b successfully
(node-1) node-1 | Starting instance i-25d5tjo9b ...
(node-1) node-1 | Start instance i-25d5tjo9b successfully
(node-1) node-1 | Waiting SSH service 101.200.126.206:22 is ready to connect ...
(node-1) node-1 | Uploading SSH keypair to 101.200.126.206:22 ...

It just stay there forever, I even waited for several hours with no luck. And there is no error, so I don't know what's going on.

I tried to ssh the instance manually with the password set in the above environment variables, it's fine. So the instance created correctly, ssh connection is ok.

My docker version is 1.12.0:

$ docker version
Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:04:48 2016
 OS/Arch:      darwin/amd64
 Experimental: true

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:04:48 2016
 OS/Arch:      linux/amd64
 Experimental: true

My docker machine version is 0.8.0:

$ docker-machine version
docker-machine version 0.8.0, build b85aac1
denverdino commented 8 years ago

I cannot reproduce the problem. Can you try again with debug mode? Then we can have more information for where the problem is. export DEBUG=true

Thank you

twang2218 commented 8 years ago

I'm not sure whether it's GFW related problem. I tried to reproduce the problem this morning with export DEBUG=true, I got 2 failed, one partially failed, but with different errors.

Here is the first attempt:

$ docker-machine create -d aliyunecs --engine-registry-mirror https://xxxx.mirror.aliyuncs.com node-1
Running pre-create checks...
Creating machine...
(node-1) node-1 | Creating key pair for instance ...
(node-1) node-1 | Configuring security groups instance ...
(node-1) node-1 | Creating instance with image ubuntu1404_64_20G_aliaegis_20150325.vhd ...
Error creating machine: Error in driver during machine creation: node-1 | Failed to create instance: Aliyun API Error: RequestId:  Status Code: -1 Code: AliyunGoClientFailure Message: Get https://ecs.aliyuncs.com?AccessKeyId=xxx&Action=CreateInstance&ClientToken=elMy_418cuCMsr09FvQY_6fZgwaNpp6o&Format=JSON&ImageId=ubuntu1404_64_20G_aliaegis_20150325.vhd&InstanceName=node-1&InstanceType=ecs.t1.small&InternetChargeType=PayByTraffic&InternetMaxBandwidthOut=100&IoOptimized=none&Password=xxx&RegionId=cn-beijing&SecurityGroupId=sg-25qocfpse&SignatureMethod=HMAC-SHA1&SignatureNonce=8BJvDbrq_qZlc44BXplTueUpd3Xk994T&SignatureVersion=1.0&Timestamp=2016-08-10T01%3A28%3A15Z&Version=2014-05-26&Signature=6DvfsB76ZmH904CM55ti08uo38E%3D: net/http: TLS handshake timeout

I checked on the Aliyun Control Panel, the first attempt failed to create any instance on Aliyun ECS.

Here is a second one:

$ docker-machine create -d aliyunecs --engine-registry-mirror https://xxxx.mirror.aliyuncs.com node-2
Running pre-create checks...
Creating machine...
(node-2) node-2 | Creating key pair for instance ...
(node-2) node-2 | Configuring security groups instance ...
(node-2) node-2 | Creating instance with image ubuntu1404_64_20G_aliaegis_20150325.vhd ...
Error creating machine: Error in driver during machine creation: node-2 | Failed to create instance: Aliyun API Error: RequestId:  Status Code: -1 Code: AliyunGoClientFailure Message: Get https://ecs.aliyuncs.com?AccessKeyId=xxx&Action=CreateInstance&ClientToken=rydzLrhiG07nZ1iaitA11O_rIftB3VIM&Format=JSON&ImageId=ubuntu1404_64_20G_aliaegis_20150325.vhd&InstanceName=node-2&InstanceType=ecs.t1.small&InternetChargeType=PayByTraffic&InternetMaxBandwidthOut=100&IoOptimized=none&Password=xxx&RegionId=cn-beijing&SecurityGroupId=sg-25jybstpe&SignatureMethod=HMAC-SHA1&SignatureNonce=PVw5PIqW9RAl4QD7GWmfXRKS8bBYhRnW&SignatureVersion=1.0&Timestamp=2016-08-10T01%3A29%3A52Z&Version=2014-05-26&Signature=c%2FxO%2Fz0hi8Wqks2DSniABDFizKQ%3D: read tcp 10.0.1.50:49398->140.205.135.111:443: read: operation timed out

The second attempt actually create a Aliyun ECS instance, but the status is not running.

Here is the third one:

$ docker-machine create -d aliyunecs  node-3
Running pre-create checks...
Creating machine...
(node-3) node-3 | Creating key pair for instance ...
(node-3) node-3 | Configuring security groups instance ...
(node-3) node-3 | Creating instance with image ubuntu1404_64_20G_aliaegis_20150325.vhd ...
(node-3) node-3 | Create instance i-25f8i5y2t successfully
(node-3) node-3 | Allocate publice IP address 123.56.98.206 for instance i-25f8i5y2t successfully
(node-3) node-3 | Starting instance i-25f8i5y2t ...
(node-3) node-3 | Start instance i-25f8i5y2t successfully
(node-3) node-3 | Waiting SSH service 123.56.98.206:22 is ready to connect ...
(node-3) node-3 | Uploading SSH keypair to 123.56.98.206:22 ...
(node-3) node-3 | Created instance i-25f8i5y2t successfully with public IP address 123.56.98.206 and private IP address 10.170.187.219
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(upstart)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "123.56.98.206:2376": tls: DialWithDialer timed out
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

The third attempt actually create a working docker host, I can docker-machine ssh node-3, and operate Docker from there.

SSH Key, TLS handshake, tls timed out, these patterns are very similar to the situation the GFW show its existence.

Beside my local network, I also tried to do the same on Digital Ocean droplet, without any luck. Could you give the aliyunecs driver a try from any server outside of the China mainland? Thanks.

menglingwei commented 8 years ago

@twang2218 我在国内网络环境测试了一下,暂未发现问题。所以判断应该是由于GFW导致的。我找个国外的环境试一下。

menglingwei commented 8 years ago

@twang2218 你可以尝试使用一下美西Region的机器

menglingwei commented 8 years ago

@twang2218 由于你在国外,所以你无需指定使用阿里云的mirror和镜像地址。你可以不去指定MACHINE_DOCKER_INSTALL_URL 和 --engine-registry-mirror 参数

twang2218 commented 8 years ago

我关闭这个issue了。因为多次测试之后,发现如果使用SFO(西海岸)的Digital Ocean服务器,创建北京地区阿里云服务器,成功几率大了很多。那很可能这个问题是伟大的墙所致。

@menglingwei 我创建的的是北京地区的阿里云服务器,那可是必须要配 MACHINE_DOCKER_INSTALL_URL--engine-registry-mirror 的。

denverdino commented 8 years ago

Thanks for update.