aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

Single node forests have mass off by 1 #318

Closed sudiptoguha closed 2 years ago

sudiptoguha commented 2 years ago

This shows up in PR 317 -- due to the implicit mass calculation (to save space) and the serialization. In version 2.0, the mass at leaves was explicitly maintained but that information was already available in the corresponding sampler. This was remedied, but not addressed in serialization of single node forests.