Implement the heartbeat event for each agent to log the last seen.What is the amount of time between heartbeats we can consider an agent as still alive? If that time passes, what do we do with the agent and the associated resources?
In the library we should have a routine for keeping track of agent's heartbeats
verify if agents are timing out
should this be a service internal to the sunfish core or an external one?
We should also figure out what to do with resources of an allegedly failed agent (missed heartbeat)
heartbeat messages could be lost so a request for agent lifeness proof is required. Perhaps in the form of an event sent to the agent
resources owned by a failed agent could be still functional. One approach would be that of leaving attached resources untouched, unless other clients report failure, and stop allocating resourced from the failing agent until it is operational again.
Implement the heartbeat event for each agent to log the last seen.What is the amount of time between heartbeats we can consider an agent as still alive? If that time passes, what do we do with the agent and the associated resources?