aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

Expose summarization of PointStore #346

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

The samples sampled by RCF are are stored in the point store since V2.0. Now that summarization is available, it may be helpful to explore the contents of the point store using the summarization. Since RCFs are dynamic sketches of data (https://opensearch.org/blog/odfe-updates/2019/11/random-cut-forests/) an additional consequence of point store summarization would be a dynamic clustering primitive.

sudiptoguha commented 1 year ago

PR 347