aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

How can I serialize the object RandomCutForest to array bytes? #377

Closed zuoxiang95 closed 1 year ago

zuoxiang95 commented 1 year ago

hey guys, did there have any solution to serialize RandomCutForest? Here is my serialize function, but it doesn't work:

  def serialize[T](o: T): Array[Byte] = {
    val bos = new ByteArrayOutputStream() 
    val oos = new ObjectOutputStream(bos)
    oos.writeObject(o)
    oos.close()
    bos.toByteArray
  }
sudiptoguha commented 1 year ago

First, take a look at https://github.com/aws/random-cut-forest-by-aws/blob/main/Java/examples/src/main/java/com/amazon/randomcutforest/examples/serialization/ObjectStreamExample.java

You can choose to save tree state/etc. There are options for Jackson. However bytes[] seem to be slow and perhaps you'd be (comparability) better served with ProtoStuff

https://github.com/aws/random-cut-forest-by-aws/blob/main/Java/examples/src/main/java/com/amazon/randomcutforest/examples/serialization/ProtostuffExampleWithShingles.java

However FYI as we push to the next release 4.0 we will likely be changing the state classes. In the next few PRs (in the next 3.X) we will update the layout of the information without breaking the state classes. As a philosophy, the library will focus on the functional aspects of decision forests and would not be able to add to new specific serialization formats.

But there is a sub-package https://github.com/aws/random-cut-forest-by-aws/tree/main/Java/serialization/src/main/java/com/amazon/randomcutforest/serialize and contributions/discussions are welcome. So far that package has focused on conversion for the major upgrades.

zuoxiang95 commented 1 year ago

@sudiptoguha Thanks for your kindly reply. Your solution perfectly solved my problem!!!