aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
213 stars 34 forks source link

ungrading thresholding and fixing RCFCaster errors for initial values #385

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

Issue #, if available: #365

Description of changes: The main reason for ThresholdedRandomCutForest had been to manage the thresholding of RCF scores. An improvement in this regard had been to allow for different transformation methods. This PR adjusts the predictor corrector for each of those transformations, taking into account the new predictor corrector that uses forecasting. In addition the PR fixes issues with showing errors for the initial segment when the calibration has not been turned on.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

kaituo commented 1 year ago

I still have questions on the following

https://github.com/aws/random-cut-forest-by-aws/pull/385#discussion_r1195580513 https://github.com/aws/random-cut-forest-by-aws/pull/385#discussion_r1192994058

Otherwise, the PR looks good.