aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
211 stars 34 forks source link

cleanup and initiating 3.7.0-SNAPSHOT #382

Closed sudiptoguha closed 1 year ago

sudiptoguha commented 1 year ago

Description of changes: The goal of the changes in this PR (and subsequent ones) leading up to 4.0 is to enable RCF to handle generic numeric as well as non-numeric objects. It has been known that RCFs can implement most functions of random forests over a stream https://opensearch.org/blog/random-cut-forests/

Much of that scaffolding was built in the in-situ transition from RCF 1.0 to RCF 3.0 to enable simultaneously running different precision forests (RCF 1.0 was double precision), as well as pointer based versus compact representations, using the same code. At the same time other functionalities such as ParkServices used double precision and it seemed prudent to not change too many facets. Double precision has been deprecated and will not be available in RCF 4.0. This PR begins cleanup alongside doubling down on generics. Subsequent PRs will handle ParkServices.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.