aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

Expose Summarize as a generic #345

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

The summarization function is useful in the missing value interpolation/forecast for multiple variables since it accounts for scenarios. It Is an useful summarization primitive because it does not need to know the target number of clusters/scenarios and may be used elsewhere. The summarization already accepts arbitrary distance functions. Goal is to generalize to instead of being fixed to <float[]>.

sudiptoguha commented 1 year ago

PR 347