Closed beyhan closed 3 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/172245015
The labels on this github issue will be updated when the story is started.
One option that might help is to set the HealthFilter/HealthWatcher work pool size in the job spec. When adjusting the value it would be important that the work pool can get through all requests before remote_health_interval
, otherwise a backlog would accumulate.
https://github.com/cloudfoundry/bosh-dns-release/blob/master/src/bosh-dns/dns/main.go#L145 https://github.com/cloudfoundry/bosh-dns-release/blob/master/src/bosh-dns/dns/server/records/health_filter.go#L40
Triggering health checks at different times (instead of triggering everything on one timer) could work but will likely involve more complexity.
I don't have access to large environments to verify change in cpu load these changes would hopefully offer, so I'd appreciate it if you could prove that out.
This issue was marked as Stale
because it has been open for 21 days without any activity. If no activity takes place in the coming 7 days it will automatically be close. To prevent this from happening remove the Stale
label or comment below.
This issue was closed because it has been labeled Stale
for 7 days without subsequent activity. Feel free to re-open this issue at any time by commenting below.
We observe that the
bosh-dns
server does all health checks, which are currently active at once. This has the impact that thebosh-dns
process produces CPU spikes, which can impact other processes on the same VM. It will be better to distribute the checks over the remote_health_interval.