Closed craig8 closed 8 years ago
I agree with Craig that we should be able to run this agent either on a standalone VOLTTRON instance or on VOLTTRON Central. Either way, we should have methods to limit the number of alerts etc an agent generates if it detects an error.
Say memory usage breaks the threshold, drops, then breaks it again. Only one alert should be sent "Check memory usage on platform X". This could be controlled with a cooldown (wait X amount of time before sending this again) and/or an acknowledgement (don't send again until the last alert has been acknowledged/reset).
Items remaining:
I should note that ThresholdDetectionAgent is very simple. It listens to topics and raises alerts when the topic's value exceeds a configured threshold. It does not keep state regard the number of alerts sent, how recent they have been sent, if an alert has been acknowledged etc. We might want to build more logic into this agent or we could filter the alerts on the receiving end (e.g, VolttronCentral).
This agent will monitor the local machine's resources (memory and disk space). The thresholds of which should be able to be set via a protected rpc call. This information is already being published to the machines bus via the platform.agent using psutils. The agent should determine whether an alert should be raised to it's platform agent. The platform agent will communicate with volttron central it's status.
Note this agent could also be meant to run on volttron central monitoring the incoming messages from the different nodes and reacting accordingly.