aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

rust: enabling dynamic directional density #326

Closed sudiptoguha closed 2 years ago

sudiptoguha commented 2 years ago

Description of changes: enables the dynamic density estimation (available in Java since V1.0) for Rust and brings the Rust version to (almost) feature parity with the Java RCF 2.0 (but using the layout and space saving of RCF3.0).

We note that the calibration/scaling is different from the Java version. This Rust version is calibrated such that for "high" density regions, the density would be less dependent on sample size. We note that density, in itself, can be less helpful than comparisons of densities (often their logarithms). The basic functions used are very similar to the displacement scoring used in the original paper and thus displacement_score() is enabled as well. Displacement score is calibrated to be in range [0,1] and less sensitive to the sample size (for potential anomalies).