buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
587 stars 153 forks source link

Reload config file without restart #352

Open ulrfa opened 3 years ago

ulrfa commented 3 years ago

Sometimes I would like to change bazel-remote’s configuration. But restarting bazel-remote would interfer with ongoing builds.

It would be nice if a configuration file could be re-loaded by a SIGHUP signal.

Allowing any configuration parameter to change anytime, might be too complicated, but perhaps support re-loading a few specific parameters? And log a warning if other parameters also changed in configuration file?

Use cases:

mostynb commented 3 years ago

We could implement this naively, but that would mean adding more mutexes around using the settings which may change at any time (or using sync/atomic to access a struct).

I wonder if there's a way to avoid that overhead at the cost of some slight downtime when reloading, for example by restarting the http/grpc servers? If that's too difficult then maybe we can implement a fast-reload option: stop accepting requests, dump the index to disk, restart bazel-remote and import the index.

ulrfa commented 3 years ago

Thanks Mostyn,

I'm thinking for example allow creating and replacing cache.Proxy instances at runtime. And protect the reference to current proxy instance in disk.go with a mutex.

And perhaps in a similar way creating and replacing instances of metrics.Metrics interface: (https://github.com/buchgr/bazel-remote/blob/25e244e035a7364a4022187bb7a131e8c4b41c6f/utils/metrics/metrics.go#L75-L78)

As long as replaced parts have well defined interfaces, and not too much dependencies, I think they could be replaced in runtime, without too much added complexity.

For me it would not be OK with a slight downtime when reloading, since that could cause ongoing remote execution builds to fail. (Downtime could cause failed builds also in pure cache scenarios for those using “builds-without-the-bytes”, unless https://github.com/bazelbuild/bazel/issues/10880 is resolved)

I will not have time to implement anything of the above now, but I wanted to raise this as background to the discussion in https://github.com/buchgr/bazel-remote/pull/350 about if Prometheus label configuration should be in a separate configuration file or not.

mostynb commented 3 years ago

I haven't thought this through but if you want to avoid downtime, could a specialized proxy work? ie receive requests from clients, and forward them on to bazel-remote, with retries if bazel-remote stops accepting requests temporarily.