aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
210 stars 33 forks source link

Add Python Wrapper for RCF and Fix Error Message #404

Closed kaituo closed 1 month ago

kaituo commented 2 months ago

Issue #, if available:

Testing:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

kaituo commented 1 month ago

Looks good to me, matches my code for python bindings.

Only point I'd raise is high-level: why does TRCF only have a process() method while RCF has score(), update(), etc? Can we add a little more detail to what TRCF.process() does - is it basically "score and update"?

In a future commit I think we'd also want to add more on the documentation side, especially regarding the RCF / TRCF parameters and their meaning.

process does more than score/update. It includes all of the pre and post processing besides score/update. I added more description on the process method parameters.