aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
211 stars 34 forks source link

is there any plans to support more language such as Python? #394

Open rollmark opened 1 year ago

sudiptoguha commented 1 year ago

Thanks for the question. For an Apache 2.0 project, expansion to other languages is always a positive step. That being said there are resource constraints and multiple implementations are difficult to align. For example, even in this library, the Rust version has not gotten as much love as it should at the moment. So I am hesitant about making strong remarks about new languages. Contributions should always be welcome in an Apache 2.0 project and that continues to apply here.

For the immediate next steps (RCF 3.8, say next two PR or so) the plan is to add capabilities of automatic level shift detection which can suppress multiple anomalies (if desired -- the reverse, of paying attention to continuous anomalies, say for 3 or more readings is also a legitimate application -- this capability should be useful to either position ) with the reasons for suppression (explanations). These are standard in Stochastic control theory and predictor correctors (where the RCF predicts a raw score) and ThresholdedRCF (in ParkServices sub-package) corrects the raw RCF score in a more usable "designer's cut" of making a call on "anomaly/not anomaly" instead of just vending scores. This would hopefully address issue 390 and allow closure of the issue.

rollmark commented 1 year ago

cool,got it