Open elfenpiff opened 2 years ago
@elfenpiff This is related to both #611 and #620. We should follow RAII for the resources of the app. I suppose a hierarchical structure as sketched in the .puml
would allow easier handling of the resources in shared memory.
@mossmaurice loosely related. But the problem in here is not the handling of shared memory resource.
RouDi falsely assumes that an application has died since the high cpu load prevented that application to send the heartbeat in the required time frame.
@elfenpiff I think there is the possibility to use a pipe or stream socket. AFAIK when the writing end of a pipe/stream socket gets closed, the process with the receiving end would get a POLLHUP
via poll
Some info shared from my side about monitor mode: When CPU load is high, There is a high possibility that "keepalivemsg" can't be sent to roudi within PROCESS_KEEP_ALIVE_TIMEOUT, we use "posix::FileLock::create(runtimeName);" to check that process is really died or not.
Brief feature description
When on high CPU load it is possible that the heartbeat thread does not send its heartbeats in a given time-frame. This can cause roudi to cleanup all resources of the application which missed the heartbeat which may lead to use of resources which are deleted.
The solution should be as efficient as possible and may avoid context switches or sending messages (if possible). One approach could be to use
getpgid
, which returns the group id of a given pid. If the pid does not exist it will fail. If we could couple this with the process runtime or creation time we can identify a process and check if it is still alive.Relates
1380