This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.
If the system, whose logs are consumed and investigated, changes then once learned values (e.g., IP addresses, urls, user names, process names, ids etc.) remain in the persistency forever. The AMiner then becomes too permissive as values that should not occur any more are still allowed (e.g., in case of authentication logs: if a user is deactivated and not used any more, he remains in the persistency and if reused by an attack even months or years later, it would still be a perfectly fine and allowed value).
Additional parameters to capture when a once learned value occured the last time and subsequent deletion of values that did not occur for a configurable period of time, could be a solution. The default values could be set to 0 or -1 resulting in the same behavior as right now (values remain in the persistency forever) to be backward-compatible.
Having an aging feature would also allow to run the AMiner continuously in learning mode, so alerting on the first occurence of any anomaly would still occur, but making it then part of the persistency. As this value would age out anyway, if it was indeed an anomaly, the next occurence would be triggered again as anomaly. In case the anomaly happens more often, with higher frequency then the set aging, then the anomaly is actually "normal". A real-world use case could be user logins: Continuously monitor which users login and alert if there are new users AND alert if a user wasn't logged in for a defined period of time (this is a user that has ageed out). This is a pretty standard SIEM case. Same could be used for programs used on a laptop (names of created processes). Those that occur on an almost daily basis become part of the persistency and remain there. Also after every major OS and Appliacation update new values need to be learned quickly. If a program is not used any longer it ages out, and we will not maintain thousands of dead entries (e.g. from old versions of the same program) in the persistency.
My suggestion would be to add a parameter "aging: " to all detectors that can implement this functionality (note that some detectors already use aging, e.g., the FrequencyDetector only uses the num_windows most recent time windows to compute the event frequencies in the upcoming time window). For detectors such as the ValueDetector aging could be implemented by storing the timestamp of the last observation of every value in the model/persistency. The time of the log events and not the real time should be used to ensure that this functionality works both in forensic and online operation.
If the system, whose logs are consumed and investigated, changes then once learned values (e.g., IP addresses, urls, user names, process names, ids etc.) remain in the persistency forever. The AMiner then becomes too permissive as values that should not occur any more are still allowed (e.g., in case of authentication logs: if a user is deactivated and not used any more, he remains in the persistency and if reused by an attack even months or years later, it would still be a perfectly fine and allowed value).
Additional parameters to capture when a once learned value occured the last time and subsequent deletion of values that did not occur for a configurable period of time, could be a solution. The default values could be set to 0 or -1 resulting in the same behavior as right now (values remain in the persistency forever) to be backward-compatible.
Having an aging feature would also allow to run the AMiner continuously in learning mode, so alerting on the first occurence of any anomaly would still occur, but making it then part of the persistency. As this value would age out anyway, if it was indeed an anomaly, the next occurence would be triggered again as anomaly. In case the anomaly happens more often, with higher frequency then the set aging, then the anomaly is actually "normal". A real-world use case could be user logins: Continuously monitor which users login and alert if there are new users AND alert if a user wasn't logged in for a defined period of time (this is a user that has ageed out). This is a pretty standard SIEM case. Same could be used for programs used on a laptop (names of created processes). Those that occur on an almost daily basis become part of the persistency and remain there. Also after every major OS and Appliacation update new values need to be learned quickly. If a program is not used any longer it ages out, and we will not maintain thousands of dead entries (e.g. from old versions of the same program) in the persistency.