aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

dynamic multi summarization #347

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

Issue #, if available: 345 and 346

Description of changes: This PR introduces summarization based on multi-centroid representation of clusters and exposes the same for point stores. Such a summarization provide a view into the aggregate set of points stored by RCF. As a consequence, RCFs can be used to provide dynamic summarization as data is updated. In addition, the summarization is now rewritten over generics and may be useful in scenarios where only a distance function over generic objects is available.

Several examples are added to the randomcutforest-examples as illustrations of use.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.