altoo-ag / akka-kryo-serialization

Kryo-based serialization for Akka
Apache License 2.0
241 stars 73 forks source link

Extract non-Akka code into a library? #288

Closed ghostdogpr closed 1 year ago

ghostdogpr commented 1 year ago

Chill being unmaintained and stuck with Kryo 4, I am looking into alternatives and ran into this library. However I am not using Akka and am only interested in the "Scala" wrapper around Kryo. What do you think about extracting the common part which is not depending on Akka into a separate library? Another benefit of this would be to reduce the duplication between the Akka and the Pekko repositories as they could both depend on the common library.

danischroeter commented 1 year ago

Hi @ghostdogpr We already thought about that - especially when forking off pekko. So far we did not have enough incentive to extract the common code into a scala-kryo-serialization. But we do think it would be a good thing... Because this change would probably be breaking we'd want to do that before we release a pekko-kryo-serialization 1.0.0. Would you be willing to help with testing?

ghostdogpr commented 1 year ago

Sure, I could use a snapshot or RC version and run it on our test suite at work.

I'm also maintaining Shardcake that depends on Chill and would be happy to switch over to this library if possible.

nvollmar commented 1 year ago

I'm currently working on extracting the Scala serialization part into separately usable library.

The Akka serializer has two modes of operation. Either serialization puts the class the serialized data for later deserialization without a given class, or serialization omits the class and later deserialization requires the class to be given. This is due to Akka already having that information already on the message.

This makes the API somewhat hard to understand (option input that is tied to a config flag) but hidden in Akka.

@ghostdogpr How would you expect the API? Would you prefer a simpler API that always puts the class into the serialized data?

ghostdogpr commented 1 year ago

@nvollmar we need something like this:

def encode(message: Any): Array[Byte]
def decode[A](bytes: Array[Byte]): A

Actual code with Chill (Task is from ZIO, you can ignore it):

def encode(message: Any): Task[Array[Byte]] = ZIO.attempt(kryoPool.toBytesWithClass(message))
def decode[A](bytes: Array[Byte]): Task[A]  = ZIO.attempt(kryoPool.fromBytes(bytes).asInstanceOf[A])
nvollmar commented 1 year ago

Thank for the feedback. I meanwhile started drafting the api as:

  def serialize(obj: AnyRef): Try[Array[Byte]]
  def deserialize[T: ClassTag](bytes: Array[Byte]): Try[T]

It leans a bit towards how the Akka API is defined. Would that be suitable for you?

ghostdogpr commented 1 year ago

Is the ClassTag necessary? We didn't have it in our API.

Looking into your code, I see you use readObject which requires a class, a version with writeClassAndObject/readClassAndObject would be more what I need (but I can imagine both of them to be useful).

nvollmar commented 1 year ago

The Akka serialization uses both readObject and readClassAndObject depending on the config. For the Scala serialization I removed that and always use readClassAndObject.

I added the classTag because the underlying API that will serve both cases expects a manifest. I should be able to eliminate that and reduce the API to:

def serialize(obj: AnyRef): Try[Array[Byte]]
def deserialize[T](bytes: Array[Byte]): Try[T]
ghostdogpr commented 1 year ago

Looks good!

nvollmar commented 1 year ago

@ghostdogpr A non-Akka library is in works https://github.com/altoo-ag/scala-kryo-serialization, we will prepare a first release soon and use it as base for pekko-kryo-serialization later

ghostdogpr commented 1 year ago

I just tried the first RC and it works great in the general case! 🎉

I just have a couple remarks/questions:

EDIT: I was able to use my custom ReferenceResolver, seems all good 👍

nvollmar commented 1 year ago
ghostdogpr commented 1 year ago
  • I've chosen AnyRef as it is the Java Object in Scala. But we can have a look at that.

I checked Chill's code and basically it accepts Any and passes it to the Java function without casting. If it's AnyVal I think Scala transforms it on its own.

  • DefaultKryoInitializer being configured by config comes due to Akka/Pekko creating the serializer by config. For this project we could also remove the config option and pass it in the constructor instead.

It would be nice for people not using Akka/Pekko but is definitely not a blocker.

danischroeter commented 1 year ago

DefaultKryoInitializer being configured by config comes due to Akka/Pekko creating the serializer by config. For this project we could also remove the config option and pass it in the constructor instead.

It would be nice for people not using Akka/Pekko but is definitely not a blocker.

@ghostdogpr Why do you overwrite the ReferenceResolver and is this a common usecase?

This is the current behavior:

If you only overwrite maximumCapacity of MapReferenceResolver then we are better off providing this value in the config.

ghostdogpr commented 1 year ago

@ghostdogpr Why do you overwrite the ReferenceResolver and is this a common usecase?

We found out that performance was bad for large objects because of the reference tracking, so we tried to disable it but ran into stack overflow for some recursive types (Exception for example, but also nested case classes in Scala that have a reference to their parent). So we ended up making our own that overrides useReferences to exclude a bunch of "known" types. That way we get the speed and no surprises.

I was able to do it with your library like this:

class CustomKryoInitializer extends DefaultKryoInitializer {
  override def preInit(kryo: ScalaKryo): Unit = {
    kryo.setReferenceResolver(myCustomResolver)
    super.preInit(kryo)
  }

And then referencing this in the config with the kryo-initializer key.

nvollmar commented 1 year ago

Thanks for testing and the feedback. I've made PR to switch the Scala API to any.

I'll close this issue, if you have something else please open a new issue in the scala-kryo-serialization project :)

ghostdogpr commented 1 year ago

Thanks a lot! 🙏

danischroeter commented 1 year ago

@ghostdogpr with https://github.com/altoo-ag/scala-kryo-serialization/pull/3 you can use

class CustomKryoInitializer extends DefaultKryoInitializer {
  override def def createReferenceResolver(settings: KryoSerializationSettings): ReferenceResolver = myCustomResolver
}