An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
Description of changes:
In the calculation of gapLow[y] and gapHigh[y], the expressions for the ratio-based thresholds were incorrectly using Math.abs(a) where a = scale[y] * point[startPosition + y]. Since point[startPosition + y] is the normalized value (x - mean) / std, multiplying by scale[y] (which is std) gives (x - mean).
However, to accurately compute the thresholds based on the actual value x, we need to add back the mean (shiftBase). Therefore, (a + shiftBase) equals (x - mean) + mean = x.
The corrected code now uses Math.abs(a + shiftBase). Read changes in PredictorCorrector for details.
Testing done:
added an IT.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes: In the calculation of
gapLow[y]
andgapHigh[y]
, the expressions for the ratio-based thresholds were incorrectly usingMath.abs(a)
wherea = scale[y] * point[startPosition + y]
. Sincepoint[startPosition + y]
is the normalized value(x - mean) / std
, multiplying byscale[y]
(which isstd
) gives(x - mean)
.However, to accurately compute the thresholds based on the actual value
x
, we need to add back the mean (shiftBase
). Therefore,(a + shiftBase)
equals(x - mean) + mean = x
.The corrected code now uses
Math.abs(a + shiftBase)
. Read changes in PredictorCorrector for details.Testing done:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.