cloudfoundry / bosh

Cloud Foundry BOSH is an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services.
https://bosh.io
Apache License 2.0
2.03k stars 657 forks source link

unable to bosh cck an unresponsive vm (very high cpu load) #2531

Open poblin-orange opened 1 month ago

poblin-orange commented 1 month ago

Describe the bug

On a overloaded vm, we met the following issue:

To Reproduce Steps to reproduce the behavior (example):

  1. Deploy a bosh director on with
  2. Upload and
  3. Deploy
  4. bosh ssh to a specific instance
  5. Run on the vm to see the behavior

Expected behavior A clear and concise description of what you expected to happen.

Logs Logs are always helpful! Add logs to help explain your problem.

Versions (please complete the following information):

Deployment info: If possible, share your (redacted) manifest and any ops files used to deploy BOSH or any other releases on top of BOSH.

If you used any deployment strategy it'd be helpful to point it out and share as much about it as possible (e.g. bosh-deployment, PCF, genesis, spiff, etc)

Additional context Add any other context about the problem here.

beyhan commented 1 month ago

Could you please try with the bosh recovery option documented in https://bosh.io/docs/recover/?

gberche-orange commented 1 month ago

Thanks @beyhan, we'll test.

BTW, the sale symptom was previously fixed on bosh vms and bosh deploy in https://github.com/cloudfoundry/bosh/commit/c9cc3ff80774d00a52be1be3f9d8bad6c6674591 but presumably not yet in bosh cck https://github.com/cloudfoundry/bosh/issues/1754 associated to https://www.pivotaltracker.com/n/projects/1456570/stories/151434806 see screenshot image image