aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
210 stars 33 forks source link

Fix confidence adjustment when all input values are missing #405

Closed kaituo closed 1 month ago

kaituo commented 1 month ago

Issue #, if available:

Description of changes:

This commit addresses an issue where confidence was not appropriately adjusted when all input values for the current timestamp were missing. The expected behavior is for confidence to decrease after imputation and increase after actual values are observed. The bug was due to marking the input as not fully imputed even when it was, in fact, fully imputed.

Additionally, this commit ensures that the numberOfImputed counter is decremented when a new timestamp is encountered and the current numberOfImputed is greater than zero. This change guarantees that confidence increases after actual values are observed.

This PR also adds numberOfImputed to PreprocessorState. Without this, the deserialized Preprocessor would behave inconsistently compared to its pre-serialized state.

Testing:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.