Open dhzhuo opened 6 years ago
@DonghuiZhuo We have most of this already.
OMS keeps track of Kernel Servers.
OMS today only keeps a list of Kernel Servers; it does not keep track of Kernel Servers. It does not know if a Kernel Server is live or dead. It does not remove dead Kernel Servers from its registry.
OMS keeps track of SO's and their locations.
Save as above. We need a mechanism to monitor the healthiness of SO. We also need a mechanism which is able to bring up new SO instance when some SO instance dies. We need something like Replica Set at SO level.
Health checks I'm not so sure of, but I think we should add this to OMS.
Are you referring to the health check of Kernel Server or health check of SO?
Publishing events on Kernel Server registration. That's not there yet. I'm curious what use case you have in mind?
Just some thought to share. I think Kernel Server registration/deregistration events should be broadcasted to all group policies so that group policies have chances to relocate SO instances.
Some thought on monitoring: https://docs.google.com/document/d/1g5SnzsnyGXzdZVDF_uj9MQJomQpHS-PMpfwnYn4RNDU/edit#heading=h.j9vsjm8kyruk
See also #195
Status update: Some progress made in design document referenced in https://github.com/Huawei-PaaS/DCAP-Sapphire/issues/459#issuecomment-447521716
Sub tasks for this issue are being worked upon
Unassigning myself. Reassign to me if there is anything you need me to do on this.
Related to #459 and #344 Moving this to backlog as all of it is not needed for Barcelona KubeCon
[Quinton] This description is out of date? See https://github.com/Huawei-PaaS/DCAP-Sapphire/issues/78#issuecomment-377374687 instead.
We can use this as a master issue to track all the individual tasks, that are approximately:
344 Implement object health checks in base sapphire server policyhealth metrics, storing them in the local kernel server.
Consider deleting the original text below.
========================== From Donghui: We need a membership management mechanism in Sapphire core: