envoyproxy / xds-relay

Caching, aggregation, and relaying for xDS compliant clients and origin servers
Apache License 2.0
132 stars 29 forks source link

Create monitor for alarming on cache drift #167

Open jyotimahapatra opened 4 years ago

jyotimahapatra commented 4 years ago

While deploying xdsrelay in staging environment for a few critical services, we realized that the cache for eds had a drift. The cache drift perhaps happened due to a bug. We need a way in xdsrelay to compare cache entried periodically and alarm when they dont match. The xdsrelay operators need a way to at least restart the server and reset the cache in order to mitigate when this happens. There could always be bugs during initial launch, and such a method can keep the system safe. This feature should be based on a bootstrap config so that it can be enabled only when configured. It also needs a way to transform xdsrelay keys with upstream control plane keys.

jyotimahapatra commented 4 years ago

@jessicayuen We need some kind of monitor to make sure we alarm when cache drifts. We can make such a mechanism inhouse or think about this feature in the open from the outset, since this could be a helpful feature for operators to safely rollout the service. wdyt?