aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

Rust : introduces BasicTRCF #344

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 2 years ago

Description of changes: ThresholdedRCF (TRCF) was introduced in Java/parkservices for easier thresholding and use of the anomaly scores. This PR extends the most basic TRCF (without transformations, time augmentation or impute on the fly) in Rust. Try out "cargo test --release --test basictrcftest -- --nocapture".

The first test shows how to use 5 dimensional (multivariate) AD with synthetically injected changes. Note that predicting the expected value typically requires more observations than flagging something is wrong/potential anomaly. The second test with randomly chosen seeds gives some indication of scale and aggregate properties.

running 2 tests choosing RCF_Tiny choosing RCF_Tiny timestamp 112 1 step ago, score 1.0213785, grade 0.45268464 timestamp 136 INJECT [ 48.95336, 0, 0, 32.165417, 38.253647] timestamp 138 2 steps ago, score 1.0372964, grade 0.31086907 timestamp 143 INJECT [ 0, 0, 0, 35.515686, 0] timestamp 146 INJECT [ 32.560028, 0, 49.118023, 0, -40.676575] timestamp 146 score 1.1554488, grade 0.36780968 timestamp 220 INJECT [ 0, 0, -45.649166, 0, -35.209305] timestamp 220 DETECT [ 0, 0, -44.42599, 0, -38.404728] score 1.0871096, grade 0.12089944 timestamp 452 INJECT [ 0, -36.279896, 48.58423, 0, 29.362373] timestamp 452 DETECT [ 0, -37.53372, 49.176823, 0, 25.316154] score 1.1157002, grade 0.9562319 timestamp 562 INJECT [ -37.948364, 0, -33.879665, 0, 0] timestamp 564 2 steps ago, DETECT [ -39.00741, 0, -35.184696, 0, 0] score 1.0166736, grade 0.26120678 timestamp 677 INJECT [ 0, 0, 39.818527, 0, 0] timestamp 678 1 step ago, DETECT [ 0, 0, 41.49842, 0, 0] score 1.0293725, grade 0.42454648 timestamp 812 INJECT [ 0, 0, -44.44441, -38.252617, 0] timestamp 812 DETECT [ 0, 0, -43.69323, -36.040146, 0] score 1.0507951, grade 0.6446359 timestamp 901 DETECT [ -2.0587616, 1.2649002, 0.038499832, -2.28574, -2.6345863] score 1.0408309, grade 0.49460888 test test_basic_trcf ... ok 829 anomalies injected in 100000 points 693 detected, precision 0.93362194, recall 0.7804584 test test_basic_trcf_scale ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 5.25s