GCP custom monitoring container shouldn't be run as a Swarm service

atsign-foundation / at_server

The software implementation of Atsign's core technology

BSD 3-Clause "New" or "Revised" License

40 stars 12 forks source link

Describe the bug

At present the GCP custom monitoring container runs as a global swarm service to ensure that there is one container on every node.

This works fine when everything is working properly, but means that the monitoring container isn't running when a node is drained.

We've also found that it can take substantial time for he monitoring container to (re)start when a node is brought back to active, particularly if the manager is busy (as can happen when rebalancing a cluster).

Expected behavior

We should have monitoring in place at all times.

atsign-foundation / at_server

GCP custom monitoring container shouldn't be run as a Swarm service #651