aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
210 stars 33 forks source link

sampler changes for initialFraction #275

Closed sudiptoguha closed 2 years ago

sudiptoguha commented 3 years ago

Description of changes: currently samplers admit every point till they are full. However that may make all the samplers correlated and would indicate a change of behavior at sampleSize number of points. The behavior is now smoothed so that samplers could (potentially) diverge after initialFraction * sampleSize number of points and not store each of the first sampleSize points each.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.