An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
Description of changes: RCF has parallelism enabled via a specific thread pool implementation. There has been questions about using such (parameter ranges where parallelism helps etc. etc.). Over the long set of changes from V1.0 to now, it seems that parallelEnabled almost always helps (for a large range of parameters) for a single model. However it also seems that if there are a large number of models (as in high cardinality anomaly detection), it is verifiably better by some percentage to turn off parallelism within a model, but use multiple threads to process different models. The conclusions are testable for different settings of boundingboxcache parameter.
Description of changes: RCF has parallelism enabled via a specific thread pool implementation. There has been questions about using such (parameter ranges where parallelism helps etc. etc.). Over the long set of changes from V1.0 to now, it seems that parallelEnabled almost always helps (for a large range of parameters) for a single model. However it also seems that if there are a large number of models (as in high cardinality anomaly detection), it is verifiably better by some percentage to turn off parallelism within a model, but use multiple threads to process different models. The conclusions are testable for different settings of boundingboxcache parameter.