aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

improving precision #393

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

Issue #, if available: 390

Description of changes: The main change in the PR is to improve precision. ThresholdedRCF had the capacity to use different streaming normalizations/transformations -- these transformations are now standardized and smoothened. As a consequence, it is feasible to use transformation A to determine significance, even though the goal is to use transformation B. This is an extension of the multi-mode capability and by default, improves the precision of the results significantly.

In addition, the PR removes unused (and unlikely to be used code) which have been remnants from version 1.0 and 2.0. It also adds more tests for branch coverage, specially for RandomCutTree.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.