cloudfoundry / cf-deployment

The canonical open source deployment manifest for Cloud Foundry
Apache License 2.0
294 stars 306 forks source link

diego-api fails to update when deploying cf #766

Closed tnweiss closed 5 years ago

tnweiss commented 5 years ago

What is this issue about?

I'm trying to deploy cloud foundry in to my aws environment using bosh. When i run the bosh deploy command the isntances are created, then nats, and adapter get updated, then when bosh tries to update diego-api it hangs, eventually spitting out the following error.

                   L Error: 'diego-api/8f88690c-10b5-430d-859c-bb4f509d2cc1 (0)' is not running after update. Review logs for failed jobs: bbs
Task 82 | 22:50:30 | Error: 'diego-api/8f88690c-10b5-430d-859c-bb4f509d2cc1 (0)' is not running after update. Review logs for failed jobs: bbs

What version of cf-deployment are you using?

v7.11.0

Please include the bosh deploy... command, including all the operations files (plus any experimental operation files you're using):

bosh -n -e ${BOSH_ENV} -d ${CF_ENV} deploy cf-deployment/cf-deployment.yml \
  --vars-store ../secrets.yml \
  -l ../varsfiles/env-cloud.yml \
  -l ../varsfiles/external-db-cloud.yml \
  -o cf-deployment/operations/rename-network-and-deployment.yml \
  -o cf-deployment/operations/use-external-dbs.yml \
  -o cf-deployment/operations/use-external-blobstore.yml \
  -o cf-deployment/operations/use-s3-blobstore.yml \
  -o cf-deployment/operations/override-app-domains.yml \
  -o cf-deployment/operations/configure-default-router-group.yml

Please provide output that helps describe the issue:

Below are tailed logs from the diego-api instance

/var/vcap/sys/log/bbs/bbs.stderr.log

  code.cloudfoundry.org/lager.(*logger).Fatal(0xc000076300, 0xcddbc3, 0x1e, 0xdc5120, 0x13a57e0, 0x0, 0x0, 0x0)
          /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:162 +0x58c
  main.main()
          /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:239 +0x41d4
  panic: context deadline exceeded

/var/vcap/sys/log/bbs/bbs.stdout.log

  {"timestamp":"2019-04-30T21:30:31.597698340Z","level":"info","source":"bbs","message":"bbs.starting","data":{}}
  {"timestamp":"2019-04-30T21:30:41.623819241Z","level":"fatal","source":"bbs","message":"bbs.failed-to-create-locket-client","data":{"error":"context deadline exceeded","trace":"goroutine 1 [running]:\ncode.cloudfoundry.org/lager.(*logger).Fatal(0xc000076840, 0xcddbc3, 0x1e, 0xdc5120, 0x13a57e0, 0x0, 0x0, 0x0)\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:138 +0xc6\nmain.main()\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:239 +0x41d4\n"}}
  {"timestamp":"2019-04-30T21:30:42.590331719Z","level":"info","source":"bbs","message":"bbs.starting","data":{}}
  {"timestamp":"2019-04-30T21:30:52.634610923Z","level":"fatal","source":"bbs","message":"bbs.failed-to-create-locket-client","data":{"error":"context deadline exceeded","trace":"goroutine 1 [running]:\ncode.cloudfoundry.org/lager.(*logger).Fatal(0xc000124300, 0xcddbc3, 0x1e, 0xdc5120, 0x13a57e0, 0x0, 0x0, 0x0)\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:138 +0xc6\nmain.main()\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:239 +0x41d4\n"}}
  {"timestamp":"2019-04-30T21:30:53.607736050Z","level":"info","source":"bbs","message":"bbs.starting","data":{}}
  {"timestamp":"2019-04-30T21:31:03.633848467Z","level":"fatal","source":"bbs","message":"bbs.failed-to-create-locket-client","data":{"error":"context deadline exceeded","trace":"goroutine 1 [running]:\ncode.cloudfoundry.org/lager.(*logger).Fatal(0xc000076780, 0xcddbc3, 0x1e, 0xdc5120, 0x13a57e0, 0x0, 0x0, 0x0)\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:138 +0xc6\nmain.main()\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:239 +0x41d4\n"}}
  {"timestamp":"2019-04-30T21:31:04.617484385Z","level":"info","source":"bbs","message":"bbs.starting","data":{}}
  {"timestamp":"2019-04-30T21:31:14.642454448Z","level":"fatal","source":"bbs","message":"bbs.failed-to-create-locket-client","data":{"error":"context deadline exceeded","trace":"goroutine 1 [running]:\ncode.cloudfoundry.org/lager.(*logger).Fatal(0xc000124360, 0xcddbc3, 0x1e, 0xdc5120, 0x13a57e0, 0x0, 0x0, 0x0)\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:138 +0xc6\nmain.main()\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:239 +0x41d4\n"}}
  {"timestamp":"2019-04-30T21:31:15.638249199Z","level":"info","source":"bbs","message":"bbs.starting","data":{}}

/var/vcap/sys/log/bbs/bpm.log

  {"timestamp":"2019-04-30T21:32:10.553031174Z","level":"info","source":"bpm","message":"bpm.start.start-process.building-spec","data":{"job":"bbs","process":"bbs","session":"1.2"}}
  {"timestamp":"2019-04-30T21:32:10.553290399Z","level":"info","source":"bpm","message":"bpm.start.start-process.creating-bundle","data":{"job":"bbs","process":"bbs","session":"1.2"}}
  {"timestamp":"2019-04-30T21:32:10.557327035Z","level":"info","source":"bpm","message":"bpm.start.start-process.running-container","data":{"job":"bbs","process":"bbs","session":"1.2"}}
  {"timestamp":"2019-04-30T21:32:10.670730657Z","level":"info","source":"bpm","message":"bpm.start.start-process.complete","data":{"job":"bbs","process":"bbs","session":"1.2"}}
  {"timestamp":"2019-04-30T21:32:10.670803554Z","level":"info","source":"bpm","message":"bpm.start.complete","data":{"job":"bbs","process":"bbs","session":"1"}}
  {"timestamp":"2019-04-30T21:32:10.670824157Z","level":"info","source":"bpm","message":"bpm.start.releasing-lifecycle-lock.starting","data":{"job":"bbs","process":"bbs","session":"1.3"}}
  {"timestamp":"2019-04-30T21:32:10.670849773Z","level":"info","source":"bpm","message":"bpm.start.releasing-lifecycle-lock.complete","data":{"job":"bbs","process":"bbs","session":"1.3"}}

/var/vcap/sys/log/locket/locket.stderr.log

  {"timestamp":"2019-04-30T20:54:07.126921537Z","level":"error","source":"locket","message":"locket.failed-to-initialize-metron-client","data":{"error":"context deadline exceeded"}}
  {"timestamp":"2019-04-30T20:54:11.568937838Z","level":"info","source":"locket","message":"locket.grpc-server.started","data":{"session":"1"}}
  {"timestamp":"2019-04-30T20:54:11.570790285Z","level":"info","source":"locket","message":"locket.burglar.started","data":{"session":"2"}}
  {"timestamp":"2019-04-30T20:54:11.573896592Z","level":"info","source":"locket","message":"locket.lock-metrics-notifier.starting","data":{"interval":60000000000,"session":"3"}}
  {"timestamp":"2019-04-30T20:54:11.573971124Z","level":"info","source":"locket","message":"locket.metrics-notifier.starting","data":{"interval":60000000000,"session":"4"}}
  {"timestamp":"2019-04-30T20:54:11.574028991Z","level":"info","source":"locket","message":"locket.request-metrics-notifier.starting","data":{"interval":60000000000,"session":"5"}}
  {"timestamp":"2019-04-30T20:54:11.574088857Z","level":"info","source":"locket","message":"locket.started","data":{}}

What IaaS is this issue occurring on?

AWS

Is there anything else unique or special about your setup?

I have recently deployed version 1.38.0 succesfully. I made sure to clear the external RDS tables before redeploying.

Appreciate the help!

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/165735204

The labels on this github issue will be updated when the story is started.

tnweiss commented 5 years ago

I found the issue, I forgot to upload my secrets.yml to the bosh instance.

bosh-cloud update-runtime-config ${BOSH_DEPLOYMENT_DIR}/runtime-configs/dns.yml --vars-store ${DEPLOYMENT_DIR}/secrets.yml --name dns
addisflava commented 5 years ago

@tnweiss i have the same issue . i dont know what you did there . i dont use the bosh-cloud command u used there , and what is in the dns.yml file couldnt find it anywhere

sunjayBhatia commented 5 years ago

@addisflava that error is unfortunately a general one that means the bbs component cannot talk to locket to grab a lock and establish the master instance

It could be from a wide range of connection issues but a common one is if your cents are expired

addisflava commented 5 years ago

this is the error am getting

{"timestamp":"2019-07-23T20:52:37.376033400Z","level":"info","source":"bbs","message":"bbs.starting","data":{}}
{"timestamp":"2019-07-23T20:52:52.430802403Z","level":"fatal","source":"bbs","message":"bbs.failed-to-create-locket-client","data":{"error":"context deadline exceeded","trace":"goroutine 1 [running]:\ncode.cloudfoundry.org/lager.(*logger).Fatal(0xc000072a20, 0xce4d70, 0x1e, 0xdcd460, 0x13b0860, 0x0, 0x0, 0x0)\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:138 +0xc6\nmain.main()\n\t/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:240 +0x41f0\n"}}
panic: context deadline exceeded

goroutine 1 [running]:
code.cloudfoundry.org/lager.(*logger).Fatal(0xc000072a20, 0xce4d70, 0x1e, 0xdcd460, 0x13b0860, 0x0, 0x0, 0x0)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:162 +0x58c
main.main()
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/cmd/bbs/main.go:240 +0x41f0