An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
The outputAfter() setting was introduced primarily to accommodate outputting (noisy) scores even when the trees in RCF were not full. However there are cases where it may be desirable to not see any non-trivial output until a significant amount of data is seen -- while that can be achieved in the code that is invoking RCF, it would reduce/simplify to remove the restriction and have it solved within RCF. Also the outputAfter computation depends on the sizes of trees reaching a certain size -- it would be simpler to just have it depend on the number of updates seen by the forest.
The outputAfter() setting was introduced primarily to accommodate outputting (noisy) scores even when the trees in RCF were not full. However there are cases where it may be desirable to not see any non-trivial output until a significant amount of data is seen -- while that can be achieved in the code that is invoking RCF, it would reduce/simplify to remove the restriction and have it solved within RCF. Also the outputAfter computation depends on the sizes of trees reaching a certain size -- it would be simpler to just have it depend on the number of updates seen by the forest.