aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

Remove restrictions from outputAfter setting #387

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

The outputAfter() setting was introduced primarily to accommodate outputting (noisy) scores even when the trees in RCF were not full. However there are cases where it may be desirable to not see any non-trivial output until a significant amount of data is seen -- while that can be achieved in the code that is invoking RCF, it would reduce/simplify to remove the restriction and have it solved within RCF. Also the outputAfter computation depends on the sizes of trees reaching a certain size -- it would be simpler to just have it depend on the number of updates seen by the forest.

sudiptoguha commented 1 year ago

closed via #389