Closed yagweb closed 8 years ago
I am sort of running into a similar problem. My bosh-init gets stuck at exactly same point "Waiting for the agent on VM xxx to be ready...". Enabling bosh-init logging I see this in the log
[sshTunnel] 2016/03/03 21:11:12 DEBUG - Dialing remote server at 10.152.5.48:22 [sshTunnel] 2016/03/03 21:11:12 DEBUG - Making attempt #0 [sshTunnel] 2016/03/03 21:26:59 DEBUG - Attempt failed #0: Dialing remote server: ssh: handshake failed: EOF [sshTunnel] 2016/03/03 21:27:00 DEBUG - Making attempt #1 ...
I could verify that SSH port 22 is open and accessible. Not sure whats the reason.
You mentioned you were able to ssh into the VM. Did you use ssh-key or username/password to ssh into the VM. What key or username/password did you use?
@ajay-aggarwal I used the ssh-key which was created in the "Key Pairs" of the OpenStack project and specified for the instance by the "default_key_name" field and "private_key“ field in the deployment manifest. The ssh-key is same as the key used by bosh-init. In my case, the SSH port 22 of the instance is opened, then closed and opened again during its initializing. It seems like the bosh-init aborted when the SSH port 22 closed. I can ssh into the VM after the port opened again.
@yagweb looks like your env running virtualization inside virtualization and that's resulting in a extremely slow execution times. not much we can do about that except suggesting to switch to an env that is faster. i doubt you'll be able to deploy anything to your existing env if deploying bosh itself is that slow.
@ajay-aggarwal i would recommend opening a different issue https://github.com/cloudfoundry-incubator/bosh-openstack-cpi-release. seems to be just misconfiguration in your openstack env.
@cppforlife I also got the similar situation just like @yagweb.
===== 2016-12-22 13:36:38 UTC Running "bosh-init deploy /var/tempest/workspaces/default/deployments/bosh.yml"
Deployment manifest: '/var/tempest/workspaces/default/deployments/bosh.yml'
Deployment state: '/var/tempest/workspaces/default/deployments/bosh-state.json'
Started validating
Validating release 'bosh'... Finished (00:01:23)
Validating release 'bosh-openstack-cpi'... Finished (00:00:04)
Validating release 'uaa'... Finished (00:00:49)
Validating cpi release... Finished (00:00:00)
Validating deployment manifest... Finished (00:00:00)
Validating stemcell... Finished (00:00:50)
Finished validating (00:03:08)
Started installing CPI
Compiling package 'ruby_openstack_cpi/9485b5753d4609e92e1491ff991cb28fbde81445'... Finished (01:36:57)
Compiling package 'bosh_openstack_cpi/dd0bab98dbb820af3ec59b364badfed02ffe3f3b'... Finished (00:00:41)
Installing packages... Finished (00:00:07)
Rendering job templates... Finished (00:00:39)
Installing job 'openstack_cpi'... Finished (00:00:00)
Finished installing CPI (01:38:24)
Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3312.9'... Finished (00:05:19)
Started deploying
Creating VM for instance 'bosh/0' from stemcell '7b886ad0-e67c-48e9-8f26-52762210acd9'... Finished (00:01:17)
Waiting for the agent on VM '3275a1eb-9d32-44e2-936a-fd74652919a4' to be ready... Failed (00:00:00)
Failed deploying (00:01:17)
Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)
Command 'deploy' failed:
Deploying:
Creating instance 'bosh/0':
Waiting until instance is ready:
Starting SSH tunnel:
Parsing private key file '/tmp/bosh_ec2_private_key.pem':
asn1: structure error: superfluous leading zeros in length
===== 2016-12-22 15:25:09 UTC Finished "bosh-init deploy /var/tempest/workspaces/default/deployments/bosh.yml"; Duration: 6510s; Exit Status: 1
Exited with 1.
Able to ssh into VM instance with the command ssh -i pcf.pem vcap@VM_IP
.
I observe VM instance is not assigned with floating-ip.
FYI, I use Openstack Mitaka. When I run cf-openstack-validator, the validator fails to assign floating-ip to VM. But, I am able to assign floating-ip manually which means that the openstack environment is good enough to deploy the cloudfoundry.
@yagweb If you find any work around to this situation, can you help me with this? Thanks.
@yagweb The solution is described here. I have solved this problem by importing manually created ssh key to OpenStack.
I have an OpenStack cloud installed on VirtualBox virtual machines. I tried to deploy BOSH to the OpenStack using bosh-init according to the manual. In the resource_pools section of the deployment manifest, I set the instance_type to m1.large, shown as below,
Every thing is OK until bosh-init waiting for the agent on VM to be ready, bosh-init failed with the error message shown as below,
But after several minutes the VM is ready, I could ssh into the VM.
I thought it may be caused by timeout of the waiting at first, because I tried several times and found when this problem happened the VM was in different initializing stage. According to this assumption, I found this post https://github.com/cloudfoundry/bosh-init/issues/67. Fellow up cppforlife's comments, I added the custom options below to the deployment manifest,
I don't know whether the x is the letter x or it should be replaced by a number, so I tried both the letter x and the number 36000. The problem still came after these modifications. But in these attempts I found the problem show immediately during the VM restarts OpenSSH server, shown as below,
That is to say, when the VM Stopping OpenSSH server, the bosh-init return the error message immediately , even the VM Starting the OpenSSH server again. I also tried serveral versions of the ubuntu stemcells, same problem still existed. (Tried CentOS stemcells with another error).
After attempts with different parameter combinations, I found a workaround, changing the instance_type from m1.large to m1.small can make the deploying process continue sometimes. But the deploying time is extremely long with about 11 hours', and some jobs may failed in running as blow,
Set instance_type to m1.medium was not working neither. It seems that when there are limited VCPUs and memory, sometimes the message of stopping OpenSSH server can be missed by bosh-init.
My question is, whether I missed some options of the ssh_tunnel or is it a bug of the sshtunnel in order to use the instance with Flavor be ml.large? Or can bosh-init continue to deploy bosh on a existed VM?