An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
Description of changes: This PR introduces summarization based on multi-centroid representation of clusters and exposes the same for point stores. Such a summarization provide a view into the aggregate set of points stored by RCF. As a consequence, RCFs can be used to provide dynamic summarization as data is updated. In addition, the summarization is now rewritten over generics and may be useful in scenarios where only a distance function over generic objects is available.
Several examples are added to the randomcutforest-examples as illustrations of use.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available: 345 and 346
Description of changes: This PR introduces summarization based on multi-centroid representation of clusters and exposes the same for point stores. Such a summarization provide a view into the aggregate set of points stored by RCF. As a consequence, RCFs can be used to provide dynamic summarization as data is updated. In addition, the summarization is now rewritten over generics and may be useful in scenarios where only a distance function over generic objects is available.
Several examples are added to the randomcutforest-examples as illustrations of use.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.