aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

rust summarize_list error #362

Closed acpeakhour closed 1 year ago

acpeakhour commented 1 year ago

When using the Rust lib, BasicTRCF gives an indexing error. The below patch seems to fix this.

@@ -56,14 +56,14 @@ impl FieldSummarizer { &self, pointstore: &dyn PointStore, point_list_with_distance: &[(f64, usize, f64)], missing: &[usize] ) -> SampleSummary { let mut distance_list: Vec = point_list_with_distance.iter().map(|a| a.2).collect(); distance_list.sort_by(|a, b| a.partial_cmp(&b).unwrap()); let mut threshold = 0.0; if self.centrality > 0.0 { let mut always_include = 0;

acpeakhour commented 1 year ago

duplicate submission, sorry for spam

sudiptoguha commented 1 year ago

No worries, thanks for pointing this out -- the index is reaching the bounds of the array. Do feel free to issue a PR and the PRs can then be merged more easily. There already seems to be some divergence from the Java version -- but it may not be worth syncing that now since the multi-centroid and/or generic version in Rust would require (possibly) a lot more changes.

sudiptoguha commented 1 year ago

Closing based on PR 363, reopen if issue is not resolved.