aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

cleanup/refactor of summarization #324

Closed sudiptoguha closed 2 years ago

sudiptoguha commented 2 years ago

Description of changes: Before rc2 we were performing imputation differently for single and multiple dimensions. This was resolved in rc2, however the summarization based method that was in use in rc1 has been exposed via conditional fields. This PR refactors that monolithic implementation, adds documentation (and tests) with the hope that the summarization can be used in a standalone manner. The behavior of the existing functions in the library is not affected. We aim to perform similar targeted cleanup as we get ready for releasing 3.0.