Closed ogrand closed 1 year ago
Which director version are you running on?
We are running bosh/273.0.0, with bosh-vsphere-esxi-ubuntu-bionic-go_agent/1.97
Since bumping to the bosh-deployment
that uses NATS 2.0 (first with bionic) and now with Jammy, we are regularly seeing errors during compilation with:
Error: Timed out sending 'compile_package' to instance: 'compilation-xxx', agent-id: 'yyy' after 45 seconds
If we re-try a few times, eventually it seems to work.
I'm thinking it genuinely is a timeout, and would really like to be able to try extending it past the 45 seconds.
Would love to see some properties for this, particularly for the agent used on the compilation VMs.
(in case it's related, the VMs we are dealing with don't have outbound internet connectivity)
We're going to try this pre-start script on our director deploys and see if it helps with the next round of stemcell updates...
- type: replace
path: /instance_groups/name=bosh/jobs/-
value:
name: pre-start-script
release: os-conf
properties:
script: |
#!/usr/bin/env bash
set -euo pipefail
# Bump agent timeout
sed -i -e "s/\@timeout = options\[\:timeout\] || 45/\@timeout = options\[\:timeout\] || 600/g" /var/vcap/packages/director/gem_home/ruby/3.1.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb
This PR has just been merged: https://github.com/cloudfoundry/bosh/pull/2406
Is your feature request related to a problem? Please describe. When I use nats bosh director 3 steps rotation for some deployments, agent restart takes too long time with arping on public ips, and at the end the deployment failed.
To workaround this problem, I patch
agent_client.rb
ruby script to change agent starttimeout
andretries
values:Describe the solution you'd like It would be nice if we can use 2 new bosh properties to customize agent timeout instead of dirty patch on bosh release:
Thanks