cloudfoundry-attic / bosh-init

bosh-init is a tool used to create and update the Director VM
Apache License 2.0
31 stars 33 forks source link

Bosh-init registry server fail #86

Closed mrageh closed 7 years ago

mrageh commented 7 years ago

I've been following the tutorial to setup Bosh on aws, firstly the script ran for a very long time, around 5 hours and 53 minutes.

Deployment manifest: '/Users/Adam/programming/work/nird/deployments/bosh.yml'
Deployment state: '/Users/Adam/programming/work/nird/deployments/bosh-state.json'

Started validating
  Downloading release 'bosh'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh'... Finished (00:00:00)
  Downloading release 'bosh-aws-cpi'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh-aws-cpi'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Skipped [Found in local cache] (00:00:00)
  Validating stemcell... Finished (00:00:00)
Finished validating (00:00:00)

Started installing CPI
  Compiling package 'ruby_aws_cpi/5e8696452d4676dd97010e91475e86b23b7e2042'... Finished (00:01:40)
  Compiling package 'bosh_aws_cpi/cec54b1e90f27b994625ef4f4f81cc11d9d4fc7f'... Finished (00:00:48)
  Installing packages... Finished (00:00:00)
  Rendering job templates... Finished (00:00:00)
  Installing job 'aws_cpi'... Finished (00:00:00)
Finished installing CPI (00:02:29)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3262.2'... Finished (00:00:06)

Started deploying
  Creating VM for instance 'bosh/0' from stemcell 'ami-613ef00c light'... Finished (00:00:37)
  Waiting for the agent on VM 'i-03063243fccf7c11c' to be ready... Finished (00:01:37)
  Creating disk... Finished (00:00:14)
  Attaching disk 'vol-0d3ffe02c6cb9152e' to VM 'i-03063243fccf7c11c'... Finished (00:00:19)
  Rendering job templates... Finished (00:00:05)
  Compiling package 'ruby/589d4b05b422ac6c92ee7094fc2a402db1f2d731'... Finished (00:46:09)
  Compiling package 'mysql/b7e73acc0bfe05f1c6cbfd97bf92d39b0d3155d5'...
 Finished (00:23:59)
  Compiling package 'libpq/09c8f60b87c9bd41b37b0f62159c9d77163f52b8'...
 Finished (00:28:17)
  Compiling package 'ruby_aws_cpi/5e8696452d4676dd97010e91475e86b23b7e2042'...
 Finished (00:35:59)
  Compiling package 's3cli/1c5a91f02feff8a0e3a506ac51c4a3140e86f049'... Finished (00:00:06)
  Compiling package 'health_monitor/62b635f04783c2d07b0e5a50eadcae5d48b2883c'... Finished (00:01:19)
  Compiling package 'genisoimage/008d332ba1471bccf9d9aeb64c258fdd4bf76201'... Finished (00:07:33)
  Compiling package 'nginx/21e909d27fa69b3b2be036cdf5b8b293c6800158'... Finished (00:34:58)
  Compiling package 'nats/0155cf6be0305c9f98ba2e9e2503cd72da7c05c3'... Finished (00:06:46)
  Compiling package 'registry/ed95450e34ff3eae080f27a576a19b3060b3f6c2'... Finished (01:23:26)
  Compiling package 'bosh_aws_cpi/cec54b1e90f27b994625ef4f4f81cc11d9d4fc7f'... Finished (01:04:58)
  Compiling package 'director/63d986882df2533324e1611ba544ee3b02c25133'... Failed (00:17:05)
Failed deploying (05:53:37) <<<<<<<<<<<<<<

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Command 'deploy' failed:
    Building state for instance 'bosh/0':
      Compiling job package dependencies for instance 'bosh/0':
        Compiling job package dependencies:
          Adding release package archive '/Users/Adam/.bosh_init/installations/924920ae-200f-404a-7704-9ac3c6cf284e/tmp/bosh-init-release136085877/packages/director.tgz' to blobstore:
            Putting file '/Users/Adam/.bosh_init/installations/924920ae-200f-404a-7704-9ac3c6cf284e/tmp/bosh-init-release136085877/packages/director.tgz' into blobstore (via DAVClient) as blobID 'cdf30f7e-b66f-48e4-5a65-79cd2d3498e0':
              Putting dav blob cdf30f7e-b66f-48e4-5a65-79cd2d3498e0:
                Put read tcp> read: operation timed out

And from looking at the logs generated by BOSH_INIT_LOG_LEVEL=debug, the problem seems to be because the below registry error occurred: Registry error occurred: accept tcp use of closed network connection

This script would also attempt to send the below POST request 88 times, which may have something to do with the registry.

[httpClient] 2016/07/14 22:25:39 DEBUG - Sending POST request with body {"method":"compile_package","arguments":["81978656-d82b-458e-711e-327da04cb7ad","47336b7cab3659e5fed9b5a2e50952abbe54a85b","ruby","589d4b05b422ac6c92ee7094fc2a402db1f2d731",{}],"reply_to":"e559f1b0-26de-4305-6dc5-ac7d416e028f"}, endpoint https://mbus:mbus-password@
[unlimitedRetryStrategy] 2016/07/14 22:25:39 DEBUG - Making attempt #0
[httpClient] 2016/07/14 22:25:39 DEBUG - Sending POST request with body {"method":"get_task","arguments":["25ceb7af-cefc-4551-713d-180535c04a02"],"reply_to":"e559f1b0-26de-4305-6dc5-ac7d416e028f"}, endpoint https://mbus:mbus-password@
[httpAgentClient] 2016/07/14 22:25:40 DEBUG - get_task response value: map[string]interface {}{"agent_task_id":"25ceb7af-cefc-4551-713d-180535c04a02", "state":"running"}

Here is a link to a gist file with the last 1300 lines from the log file.

I'd appreciate any help on getting this issue resolved so I can setup Bosh on AWS.

dpb587-pivotal commented 7 years ago

5+ hours is an incredibly long time. A couple ideas...

Did this happen multiple times? Do you have a stable network connection to the region? Are you using the most recent bosh-init version (currently 0.0.95)? Some connectivity-related improvements were made in newer versions which may improve the situation.

mrageh commented 7 years ago

@dpb587-pivotal turns out that my network connection was not stable and when I re-ran the script with a stable network connection it took a fraction of the time to run and it successfully completed.

Therefore I think it's safe to close this issue, thanks for the help.