cloudfoundry / diego-release

BOSH Release for Diego
Apache License 2.0
201 stars 213 forks source link

Check BBS port is available #878

Closed mariash closed 8 months ago

mariash commented 8 months ago

BBS only starts listening on a port when it acquires the lock. During rolling deploy BBS instances get restarted and fail to acquire lock. In case if other process claims the port operators don't get feedback that BBS fails to start until all BBS instances are rolled. This results in the deployment without any working BBS instances. We want to get faster feedback and fail on the first BBS instance if another bosh job claims BBS port.

We explored an option of listening on port and returning 503s but HTTP clients treat 503s differently from connection refused errors. When getting connection refused HTTP clients go down BOSH dns list for (bbs.service.cf.internal) and try different BBS instance thus finding the one that has active lock. With 503s they just returning that as a response.

In this solution, BBS post-start script validates that either BBS is listening on this port or no one is listening on it.