cloudfoundry / bosh-alicloud-cpi-release

BOSH Alibaba CPI
Apache License 2.0
32 stars 20 forks source link

Unable to deploy BOSH in US Silicon Valley region using `BOSH_ALL_PROXY=ssh+sock5://...` through jumpbox #37

Closed Amit-PivotalLabs closed 6 years ago

Amit-PivotalLabs commented 6 years ago

Hi,

I am unable to successfully run bosh create-env to deploy a Director on the Alibaba Cloud, and am not sure how to debug further.

Failure

The following appears to hang forever:

$ export BOSH_LOG_LEVEL=debug
$ export BOSH_LOG_PATH=/tmp/bosh.log
$ export BOSH_ALL_PROXY=ssh+socks5://root@<JUMPBOX_EIP>:22?private-key=/Users/amitgupta/envs/pipernet/jumpbox-2.pem
$ bosh create-env ~/workspace/bosh-deployment/bosh.yml \
  --state=/Users/amitgupta/envs/pipernet/state.json \
  --vars-store=~/envs/pipernet/creds.yml \
  -o ~/workspace/bosh-deployment/alicloud/cpi.yml \
  -o ~/workspace/bosh-deployment/jumpbox-user.yml \
  -o ~/workspace/bosh-deployment/misc/powerdns.yml \
  -v dns_recursor_ip=8.8.8.8 \
  -v director_name=my-bosh \
  -v internal_cidr=192.168.0.0/24 \
  -v internal_gw=192.168.0.1 \
  -v internal_ip=192.168.0.13 \
  -v vswitch_id=vsw-rj9j17xuu1sed9izn1koz \
  -v security_group_id=sg-rj97moasncf1pba4g8e4 \
  -v access_key_id=<REDACTED> \
  -v access_key_secret=<REDACTED> \
  -v region='us-west-1' \
  -v zone='us-west-1a' \
  -v key_pair_name=bosh \
  -v private_key=~/envs/pipernet/bosh.pem

Deployment manifest: '/Users/amitgupta/workspace/bosh-deployment/bosh.yml'
Deployment state: '/Users/amitgupta/envs/pipernet/state.json'

Started validating
  Validating release 'bosh'... Finished (00:00:00)
  Validating release 'bosh-alicloud-cpi'... Finished (00:00:00)
  Validating release 'os-conf'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Validating stemcell... Finished (00:00:00)
Finished validating (00:00:01)

Started installing CPI
  Compiling package 'golang/03d27673d09c82e7a4183d651e6c293b4a501de7'... Finished (00:00:00)
  Compiling package 'bosh-alicloud-cpi/1e2589c853f94f052badab821d4014777d06ec96'... Finished (00:00:00)
  Installing packages... Finished (00:00:00)
  Rendering job templates... Finished (00:00:00)
  Installing job 'alicloud_cpi'... Finished (00:00:00)
Finished installing CPI (00:00:00)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-alicloud-kvm-ubuntu-trusty-go_agent/1018'... Skipped [Stemcell already uploaded] (00:00:00)

Started deploying
  Waiting for the agent on VM 'i-rj91w36qswr31k7nx57l'...

The debug logs seems to be stuck at the following:

$ tail -n1 /tmp/bosh.log
[httpClient] 2018/05/23 22:59:12 DEBUG - Sending POST request to endpoint 'https://mbus:<REDACTED>@192.168.0.13:6868/agent'

Versions and Modifications

$ bosh --version
version 3.0.1-712bfd7-2018-03-13T23:26:42Z

$ git remote -v # aliyun fork of bosh-deployment
origin  https://github.com/aliyun/bosh-deployment.git (fetch)
origin  git@github.com:aliyun/bosh-deployment.git (push)

$ git rev-parse HEAD
f04eaeda98944fd96b7bb417a48fa6484040e273

$ git diff --name-only
alicloud/cpi.yml
bosh.yml
jumpbox-user.yml

I downloaded os-conf release, bosh release, and the stemcell, and replaced the URLs with file:///... paths since the downloads were very slow. Furthermore, I had to modify the bosh-alicloud-cpi release as per #36 and built a dev release, so I removed the version and sha1 for that release and replaced the url with also a file:///... path:

$ git status # in bosh-alicloud-cpi-release directory
On branch release18
Your branch is up to date with 'origin/release18'.

...

    modified:   packages/bosh-alicloud-cpi/packaging

$ git diff
diff --git a/packages/bosh-alicloud-cpi/packaging b/packages/bosh-alicloud-cpi/packaging
index 715cf9b..c07465b 100644
--- a/packages/bosh-alicloud-cpi/packaging
+++ b/packages/bosh-alicloud-cpi/packaging
@@ -10,10 +10,10 @@ PLATFORM=`uname | tr '[:upper:]' '[:lower:]'`
 if [ $PLATFORM = "linux" ]; then
     export GOROOT=$(cd "${PACKAGES_DIR}/golang" && pwd -P)
     export PATH=${GOROOT}/bin:${PATH}
-else
-    echo "Mac user set GOROOT and PATH=" $PATH
-    export GOROOT=/usr/local/go
-    export PATH=$GOROOT/bin:${PATH}
+# else
+#    echo "Mac user set GOROOT and PATH=" $PATH
+#    export GOROOT=/usr/local/go
+#    export PATH=$GOROOT/bin:${PATH}
 fi

 # Build BOSH Alicloud CPI package

Additional Logs

I can SCP bosh.pem onto my jumpbox, SSH to my jumpbox, and from there I can SSH onto the machine created by bosh create-env, and see that in /var/vcap/bosh/log/current there's a lot of the following:

2018-05-24_14:58:51.46733 [clientRetryable] 2018/05/24 14:58:51 DEBUG - [requestID=dffb3282-da00-4e3f-4a43-026d79c94b1f] Requesting (attempt=4): Request{ Method: 'GET', URL: 'http://registry:REDACTED@127.0.0.1:6901/instances/i-rj9axdluwy9ixxptd07t/settings' }
2018-05-24_14:58:51.51012 [unlimitedRetryStrategy] 2018/05/24 14:58:51 DEBUG - Making attempt #31
2018-05-24_14:58:51.51016 [DelayedAuditLogger] 2018/05/24 14:58:51 ERROR - Unix syslog delivery error
2018-05-24_14:58:51.61051 [unlimitedRetryStrategy] 2018/05/24 14:58:51 DEBUG - Making attempt #32
2018-05-24_14:58:51.61053 [DelayedAuditLogger] 2018/05/24 14:58:51 ERROR - Unix syslog delivery error
2018-05-24_14:58:51.71087 [unlimitedRetryStrategy] 2018/05/24 14:58:51 DEBUG - Making attempt #33
2018-05-24_14:58:51.71089 [DelayedAuditLogger] 2018/05/24 14:58:51 ERROR - Unix syslog delivery error
2018-05-24_14:58:51.81123 [unlimitedRetryStrategy] 2018/05/24 14:58:51 DEBUG - Making attempt #34
2018-05-24_14:58:51.81125 [DelayedAuditLogger] 2018/05/24 14:58:51 ERROR - Unix syslog delivery error
2018-05-24_14:58:51.91176 [unlimitedRetryStrategy] 2018/05/24 14:58:51 DEBUG - Making attempt #35
2018-05-24_14:58:51.91179 [DelayedAuditLogger] 2018/05/24 14:58:51 ERROR - Unix syslog delivery error
2018-05-24_14:58:52.01215 [unlimitedRetryStrategy] 2018/05/24 14:58:52 DEBUG - Making attempt #36
2018-05-24_14:58:52.01217 [DelayedAuditLogger] 2018/05/24 14:58:52 ERROR - Unix syslog delivery error
2018-05-24_14:58:52.11254 [unlimitedRetryStrategy] 2018/05/24 14:58:52 DEBUG - Making attempt #37
2018-05-24_14:58:52.11258 [DelayedAuditLogger] 2018/05/24 14:58:52 ERROR - Unix syslog delivery error
2018-05-24_14:58:52.21298 [unlimitedRetryStrategy] 2018/05/24 14:58:52 DEBUG - Making attempt #38
2018-05-24_14:58:52.21300 [DelayedAuditLogger] 2018/05/24 14:58:52 ERROR - Unix syslog delivery error
2018-05-24_14:58:52.31336 [unlimitedRetryStrategy] 2018/05/24 14:58:52 DEBUG - Making attempt #39
2018-05-24_14:58:52.31338 [DelayedAuditLogger] 2018/05/24 14:58:52 ERROR - Unix syslog delivery error
2018-05-24_14:58:52.41372 [unlimitedRetryStrategy] 2018/05/24 14:58:52 DEBUG - Making attempt #40
2018-05-24_14:58:52.41374 [DelayedAuditLogger] 2018/05/24 14:58:52 ERROR - Unix syslog delivery error
2018-05-24_14:58:52.46801 [attemptRetryStrategy] 2018/05/24 14:58:52 DEBUG - Making attempt #4 for *httpclient.RequestRetryable

If I run curl from that VM, I see:

$ curl http://registry:REDACTED@127.0.0.1:6901/instances/i-rj9axdluwy9ixxptd07t/settings
curl: (7) Failed to connect to 127.0.0.1 port 6901: Connection refused

However, if I run it locally, I see:

$ curl -s http://registry:REDACTED@127.0.0.1:6901/instances/i-rj9axdluwy9ixxptd07t/settings | jq .status
"ok"

IaaS Setup

I have a VPC in US Silicon Valley with CIDR 192.168.0.0/16. I have an Ubuntu 16.04 VM with security enabled which I spun up manually through the Aliyun console to serve as my jumpbox. I gave it an Elastic IP, and it has a security group that allows me to SSH into it. I created another security group which allows all inbound TCP traffic from 192.168.0.0/16 with the idea of applying this SG to the Director so it can receive connections proxied through the jumpbox.

Other Variations

I've tried a handful of variations with no real improvement in progress:

Any suggestions on how to get further?

/cc @cppforlife @evanfarrar

Amit-PivotalLabs commented 6 years ago

This may be due to an issue with the BOSH CLI (https://github.com/cloudfoundry/bosh-cli/issues/432), hopefully nothing to do with the CPI or the IaaS setup.

jamesjoshuahill commented 6 years ago

Hi @Amit-PivotalLabs, maybe. In https://github.com/cloudfoundry/bosh-cli/issues/432 with debug log level, we saw the bosh cli stall on the first http request to the bosh director.

Amit-PivotalLabs commented 6 years ago

It appears this was an issue in the BOSH CLI which has been fixed as of v4.0.1 of the CLI.