cloudfoundry-community-attic / bosh-bootstrap

From zero to a running micro BOSH in one command line
MIT License
63 stars 46 forks source link

firstbosh on AWS breaks after instance looses ephemeral storage #255

Closed mrdavidlaing closed 11 years ago

mrdavidlaing commented 11 years ago

I'm not really sure whether this is a bug or a feature, but it did come as a surprise to me.

This error manifests itself when you try to connect to your firstbosh from your inception-server and get:

Target already set to `firstbosh'
ubuntu@ip-10-138-137-188:~$ bosh status
Config
             /home/ubuntu/.bosh_config

Director
[WARNING] cannot access director, trying 4 more times...
  timed out fetching director status

Deployment
  Manifest   /home/ubuntu/deployments/cf/cf-1383343324.yml

Is is possible to either

(a) map /var/vcap/data to somewhere persistent? (b) document how you can "redeploy" / "resurrect" your firstbosh after it has had its its ephemeral storage wiped

drnic commented 11 years ago

Dave, can we make this a bosh ticket? Or you think it's just bosh-bootstrap config?

On Fri, Nov 1, 2013 at 6:33 PM, David Laing notifications@github.com wrote:

I'm not really sure whether this is a bug or a feature, but it did come as a surprise to me.

  • bosh-bootstrap deploy on AWS creates firstbosh.
  • firstbosh has /var/vcap/data mapped to the AWS instance's ephemeral drive
  • if you stop and then restart the AWS instance (to say, save money over the weekend), the BOSH director on firstbosh fails to start up, since the /var/vcap/data is now empty. This error manifests itself when you try to connect to your firstbosh from your inception-server and get:
Target already set to `firstbosh'
ubuntu@ip-10-138-137-188:~$ bosh status
Config
             /home/ubuntu/.bosh_config
Director
[WARNING] cannot access director, trying 4 more times...
  timed out fetching director status
Deployment
  Manifest   /home/ubuntu/deployments/cf/cf-1383343324.yml

Is is possible to either (a) map /var/vcap/data to somewhere persistent?

(b) document how you can "redeploy" / "resurrect" your firstbosh after it has had its its ephemeral storage wiped

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255

mrdavidlaing commented 11 years ago

Is there a way is get verbose logging of the bosh-bootstrap process - hopefully that will give me a clue as to where that setting is coming from. On 2 Nov 2013 02:43, "Dr Nic Williams" notifications@github.com wrote:

Dave, can we make this a bosh ticket? Or you think it's just bosh-bootstrap config?

On Fri, Nov 1, 2013 at 6:33 PM, David Laing notifications@github.com wrote:

I'm not really sure whether this is a bug or a feature, but it did come as a surprise to me.

  • bosh-bootstrap deploy on AWS creates firstbosh.
  • firstbosh has /var/vcap/data mapped to the AWS instance's ephemeral drive
  • if you stop and then restart the AWS instance (to say, save money over the weekend), the BOSH director on firstbosh fails to start up, since the /var/vcap/data is now empty. This error manifests itself when you try to connect to your firstbosh from your inception-server and get:
Target already set to `firstbosh'
ubuntu@ip-10-138-137-188:~$ bosh status
Config
/home/ubuntu/.bosh_config
Director
[WARNING] cannot access director, trying 4 more times...
timed out fetching director status
Deployment
Manifest /home/ubuntu/deployments/cf/cf-1383343324.yml

Is is possible to either (a) map /var/vcap/data to somewhere persistent? (b) document how you can "redeploy" / "resurrect" your firstbosh after

it has had its its ephemeral storage wiped

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255

— Reply to this email directly or view it on GitHubhttps://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27613354 .

drnic commented 11 years ago

Its an aspect of how microbosh is built - the "bosh micro deploy" starts a registry on your local machine and then opens a ssh tunnel from the microbosh to your machine; then sends the "apply" message to the agent via its http mode. These things can't current be reproduced on restart of the VM. Possibly perhaps. It's a feature request on the bosh project afaik. /cc @tsaleh

On Sat, Nov 2, 2013 at 3:05 AM, David Laing notifications@github.com wrote:

Is there a way is get verbose logging of the bosh-bootstrap process - hopefully that will give me a clue as to where that setting is coming from. On 2 Nov 2013 02:43, "Dr Nic Williams" notifications@github.com wrote:

Dave, can we make this a bosh ticket? Or you think it's just bosh-bootstrap config?

On Fri, Nov 1, 2013 at 6:33 PM, David Laing notifications@github.com wrote:

I'm not really sure whether this is a bug or a feature, but it did come as a surprise to me.

  • bosh-bootstrap deploy on AWS creates firstbosh.
  • firstbosh has /var/vcap/data mapped to the AWS instance's ephemeral drive
  • if you stop and then restart the AWS instance (to say, save money over the weekend), the BOSH director on firstbosh fails to start up, since the /var/vcap/data is now empty. This error manifests itself when you try to connect to your firstbosh from your inception-server and get:
Target already set to `firstbosh'
ubuntu@ip-10-138-137-188:~$ bosh status
Config
/home/ubuntu/.bosh_config
Director
[WARNING] cannot access director, trying 4 more times...
timed out fetching director status
Deployment
Manifest /home/ubuntu/deployments/cf/cf-1383343324.yml

Is is possible to either (a) map /var/vcap/data to somewhere persistent? (b) document how you can "redeploy" / "resurrect" your firstbosh after

it has had its its ephemeral storage wiped

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255

— Reply to this email directly or view it on GitHubhttps://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27613354 .


Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27619022

mrdavidlaing commented 11 years ago

Looks like the question of microbosh storing data on ephemeral disk has been asked before on the mailing list. https://groups.google.com/a/cloudfoundry.org/forum/m/#!topic/bosh-dev/cVNVQEASALQ

Asked, but not really answered :) On 2 Nov 2013 15:22, "Dr Nic Williams" notifications@github.com wrote:

Its an aspect of how microbosh is built - the "bosh micro deploy" starts a registry on your local machine and then opens a ssh tunnel from the microbosh to your machine; then sends the "apply" message to the agent via its http mode. These things can't current be reproduced on restart of the VM. Possibly perhaps. It's a feature request on the bosh project afaik. /cc @tsaleh

On Sat, Nov 2, 2013 at 3:05 AM, David Laing notifications@github.com wrote:

Is there a way is get verbose logging of the bosh-bootstrap process - hopefully that will give me a clue as to where that setting is coming from. On 2 Nov 2013 02:43, "Dr Nic Williams" notifications@github.com wrote:

Dave, can we make this a bosh ticket? Or you think it's just bosh-bootstrap config?

On Fri, Nov 1, 2013 at 6:33 PM, David Laing notifications@github.com wrote:

I'm not really sure whether this is a bug or a feature, but it did come as a surprise to me.

  • bosh-bootstrap deploy on AWS creates firstbosh.
  • firstbosh has /var/vcap/data mapped to the AWS instance's ephemeral drive
  • if you stop and then restart the AWS instance (to say, save money over the weekend), the BOSH director on firstbosh fails to start up, since the /var/vcap/data is now empty. This error manifests itself when you try to connect to your firstbosh from your inception-server and get:
Target already set to `firstbosh'
ubuntu@ip-10-138-137-188:~$ bosh status
Config
/home/ubuntu/.bosh_config
Director
[WARNING] cannot access director, trying 4 more times...
timed out fetching director status
Deployment
Manifest /home/ubuntu/deployments/cf/cf-1383343324.yml

Is is possible to either (a) map /var/vcap/data to somewhere persistent? (b) document how you can "redeploy" / "resurrect" your firstbosh after

it has had its its ephemeral storage wiped

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255

— Reply to this email directly or view it on GitHub< https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27613354>

.


Reply to this email directly or view it on GitHub:

https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27619022

— Reply to this email directly or view it on GitHubhttps://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27624055 .

drnic commented 11 years ago

To implement it, I guess the "bosh micro deploy" code also needs to exist on the microbosh VM and to kick off the reboot sequence locally. I guess that means keeping a clone of the local deployments/ folder within the microbosh.

On Sat, Nov 2, 2013 at 3:55 PM, David Laing notifications@github.com wrote:

Looks like the question of microbosh looking data on ephemeral disk has been asked before on the mailing list. https://groups.google.com/a/cloudfoundry.org/forum/m/#!topic/bosh-dev/cVNVQEASALQ Asked, but not really answered :) On 2 Nov 2013 15:22, "Dr Nic Williams" notifications@github.com wrote:

Its an aspect of how microbosh is built - the "bosh micro deploy" starts a registry on your local machine and then opens a ssh tunnel from the microbosh to your machine; then sends the "apply" message to the agent via its http mode. These things can't current be reproduced on restart of the VM. Possibly perhaps. It's a feature request on the bosh project afaik. /cc @tsaleh

On Sat, Nov 2, 2013 at 3:05 AM, David Laing notifications@github.com wrote:

Is there a way is get verbose logging of the bosh-bootstrap process - hopefully that will give me a clue as to where that setting is coming from. On 2 Nov 2013 02:43, "Dr Nic Williams" notifications@github.com wrote:

Dave, can we make this a bosh ticket? Or you think it's just bosh-bootstrap config?

On Fri, Nov 1, 2013 at 6:33 PM, David Laing notifications@github.com wrote:

I'm not really sure whether this is a bug or a feature, but it did come as a surprise to me.

  • bosh-bootstrap deploy on AWS creates firstbosh.
  • firstbosh has /var/vcap/data mapped to the AWS instance's ephemeral drive
  • if you stop and then restart the AWS instance (to say, save money over the weekend), the BOSH director on firstbosh fails to start up, since the /var/vcap/data is now empty. This error manifests itself when you try to connect to your firstbosh from your inception-server and get:
Target already set to `firstbosh'
ubuntu@ip-10-138-137-188:~$ bosh status
Config
/home/ubuntu/.bosh_config
Director
[WARNING] cannot access director, trying 4 more times...
timed out fetching director status
Deployment
Manifest /home/ubuntu/deployments/cf/cf-1383343324.yml

Is is possible to either (a) map /var/vcap/data to somewhere persistent? (b) document how you can "redeploy" / "resurrect" your firstbosh after

it has had its its ephemeral storage wiped

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255

— Reply to this email directly or view it on GitHub< https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27613354>

.


Reply to this email directly or view it on GitHub:

https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27619022

— Reply to this email directly or view it on GitHubhttps://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27624055 .


Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-community/bosh-bootstrap/issues/255#issuecomment-27634607

mrdavidlaing commented 11 years ago

Just for completeness, here is a list of what is missing after loosing the ephemeral disk /var/vcap/data

/var/vcap
├── jobs
│   ├── blobstore -> /var/vcap/data/jobs/blobstore/3.1-dev
│   ├── director -> /var/vcap/data/jobs/director/11.1-dev
│   ├── health_monitor -> /var/vcap/data/jobs/health_monitor/5.1-dev
│   ├── nats -> /var/vcap/data/jobs/nats/5
│   ├── postgres -> /var/vcap/data/jobs/postgres/4.1-dev
│   ├── powerdns -> /var/vcap/data/jobs/powerdns/3.1-dev
│   ├── redis -> /var/vcap/data/jobs/redis/3
│   └── registry -> /var/vcap/data/jobs/registry/0.1-dev
├── monit
│   ├── job
│   │   ├── 0000_micro_aws.health_monitor.monitrc -> /var/vcap/data/jobs/health_monitor/5.1-dev/0000_micro_aws.health_monitor.monitrc
│   │   ├── 0001_micro_aws.registry.monitrc -> /var/vcap/data/jobs/registry/0.1-dev/0001_micro_aws.registry.monitrc
│   │   ├── 0002_micro_aws.director.monitrc -> /var/vcap/data/jobs/director/11.1-dev/0002_micro_aws.director.monitrc
│   │   ├── 0003_micro_aws.blobstore.monitrc -> /var/vcap/data/jobs/blobstore/3.1-dev/0003_micro_aws.blobstore.monitrc
│   │   ├── 0004_micro_aws.powerdns.monitrc -> /var/vcap/data/jobs/powerdns/3.1-dev/0004_micro_aws.powerdns.monitrc
│   │   ├── 0005_micro_aws.postgres.monitrc -> /var/vcap/data/jobs/postgres/4.1-dev/0005_micro_aws.postgres.monitrc
│   │   ├── 0006_micro_aws.redis.monitrc -> /var/vcap/data/jobs/redis/3/0006_micro_aws.redis.monitrc
│   │   └── 0007_micro_aws.nats.monitrc -> /var/vcap/data/jobs/nats/5/0007_micro_aws.nats.monitrc
├── packages
│   ├── director -> /var/vcap/data/packages/director/11.28-dev
│   ├── genisoimage -> /var/vcap/data/packages/genisoimage/2.1-dev
│   ├── health_monitor -> /var/vcap/data/packages/health_monitor/5.27-dev
│   ├── libpq -> /var/vcap/data/packages/libpq/2
│   ├── mysql -> /var/vcap/data/packages/mysql/0.1-dev
│   ├── nats -> /var/vcap/data/packages/nats/3
│   ├── nginx -> /var/vcap/data/packages/nginx/2.1-dev
│   ├── postgres -> /var/vcap/data/packages/postgres/2
│   ├── powerdns -> /var/vcap/data/packages/powerdns/2.1-dev
│   ├── redis -> /var/vcap/data/packages/redis/3
│   ├── registry -> /var/vcap/data/packages/registry/0.28-dev
│   └── ruby -> /var/vcap/data/packages/ruby/3
└── sys -> /var/vcap/data/sys
mrdavidlaing commented 11 years ago

I think this is more a documentation deficiency than a bug with bosh-bootstrap.

Accordingly I'm moving the discussion over to the Backup and disaster recovery section on the community wiki