aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
211 stars 34 forks source link

using forecasting for predictor-corrector in addition to score #384

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

Description of changes: ThresholdedRandomCutForest uses a score based predictor corrector to set the thresholds. Since it was introduced, there has been progress in using RCFs to forecast (namely RCFCaster). Therefore it behooves us to use the same capability internally to improve the efficiency of the thresholding. Note that the forecast is not used to determine a candidate anomaly, but is used as an additional corrector. Previously the corrector considered the shingling behavior and the difference of successive scores. The results are significantly less less noisy. However since these improvements rely on shingleSize > 1 (which is necessary for any forecasting); ThresholdedRandomCutForest now uses internal shingling by default, which should be simpler to use as well. If the shingling was set explicitly, then there should be no change. The examples are updated as well. A low period long term example is added, with non-zero trend, to test the various transformation capabilities (NORMALIZE, NORMALIZE_DIFFERENCE, etc.)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.