aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
210 stars 33 forks source link

Implement Serializable to support serialization via ObjectOutputStream #297

Closed ylwu-amzn closed 2 years ago

ylwu-amzn commented 2 years ago

Currently MLCommons is using ObjectOutputStream to serialize object (code link). But ThresholdedRandomCutForestState doesn't implement Serializable, so it's impossible to serialize via ObjectOutputStream.

AD is using protostuff to serialize. protostuff has one bug reported in AD https://github.com/opensearch-project/anomaly-detection/issues/263, we fixed by adding local jar dependency as protostuff team has no clear plan to release new version.

If we use same way in MLCommons we need to add one more dependency and we need to maintain two serialization ways. So prefer to implement Serializable to make serialization easier.