Horusiath / Akkling

Experimental F# typed API for Akka.NET
Apache License 2.0
225 stars 45 forks source link

PersistenceLifecycleEvent not sent to typed actor #141

Open Yakimych opened 3 years ago

Yakimych commented 3 years ago

Hi! Is there a way to hook in to PersistenceLifecycleEvent in typed actors? I've looked at https://github.com/Horusiath/Akkling/issues/72 and the test, but this seems only to work for untyped actors. Here is a fork with the same test for a typed actor (Eventsourced<string>), and it currently fails: https://github.com/Yakimych/Akkling/commit/c9dc995e6479f952d2cda973b4c6b2808318b42b

My guess is that typed actors don't receive messages of anything but the specified type, so I am wondering if this is possible at all.

Horusiath commented 3 years ago

In your test, your actor uses messages of type string. In actor body, you take that string value, cast it to obj and then try to match it against PersistentLifecycleEvent - I don't see how possibly this could ever work.

Unfortunately, F# doesn't offer (untagged) type unions, so the only way for that is to define actor body via Eventsourced<obj> and then retype actor reference from IActorRef<obj>IActorRef<string> after creating actor using retype function to narrow the scope of allowed messages to the ones you care about.

Yakimych commented 3 years ago

retype does the trick, thanks! I've noticed, however, that the actor behaves differently now that it's untyped internally. I am trying to debug a scenario when the actor crashes during Restore (most likely due to deserialization issue with a Discriminated Union). While it was typed, I could see in the logs that the actor got stopped (crashed) during restore, and that's why I tried to hook into the PersistentLifecycleEvent and find more information in ReplayFailed. Now after I managed to do this via your suggestion, however, the actor no longer crashes during restore. I can only see ReplaySucceed and RecoveryCompleted, but the messages don't get replayed (so the actor ends up in the initial state).

While the deserialization problem can be solved by e.g. an EventAdapter, I am wondering if the behaviour in Akkling i correct as it stands right now. That is, shouldn't the untyped actor crash in the same way as a typed actor when deserialization of an event fails during replay? Currently it seems that it "pretends" that everything is fine, but fails silentry and the events never get replayed. I am still trying to get to the bottom of why it crashes with the default serializer, and was hoping to get some information about the cause/exception inside ReplayFailed.

Yakimych commented 3 years ago

@Horusiath Here is a (somewhat verbose) test to demonstrate the discrepancy: https://github.com/Yakimych/Akkling/commit/ad12da960599773336eb826cc3067e4d155f74a9

When I run the test for the untyped actor in Debug mode, I can see the following events logged:

In the test for the typed actor, however, the actor crashes and the test hangs when running Ask with GetState: <Received dead letter from [akka://test-system/temp/c]: ActorCommand GetState>

Is there any reason the typed and untyped actors behave differently?

NOTE: The Sqlite persistence plugin is used for those tests.

Horusiath commented 3 years ago

From what I see, you're using Json.NET for persistence - this may be the reason, when trying to deserialize json payload into obj it doesn't know what specific type of object do you want - which in this case it will default to JDocument or JObject.

This is continuous problem of using JSON.NET, and one of the reasons why we started using Hyperion - but problem with Hyperion is that it's binary format is not stabilized and it doesn't provide any guarantees, that future version will keep it backwards compatible.

Yakimych commented 3 years ago

You are right! After debugging a bit deeper, I can see that an exception is thrown in the __.Next method after matching on JObject, at jobj.ToObject<'Message>:

Newtonsoft.Json.JsonReaderException: Error reading integer. Unexpected token: StartObject. Path '[0]'.
   at Newtonsoft.Json.JsonReader.ReadAsInt32()
   at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter)
   ...

I can also see that OnRecoveryFailure is called with this exception, but I don't see anything in the logs - which made my initial debugging attempts quite a challenge. Is this exception swallowed somehow? Is there a way to make sure that it shows up in the logs even for typed actors?

I was also confused as to why the untyped actor doesn't end up along the same path, but I am guessing that JObject gets matched with | :? 'Message as msg -> first, since for untyped actors 'Message is obj here?

As to the serialization problem, I've seen those issues from 2015, but it seemed as they were solved with this PR? Have the problems reappeared later on?

Horusiath commented 3 years ago

I think the reason here is that your persistence provider uses some configuration of JSON.NET, that doesn't support type name handling. There are two problems:

  1. Without type name handling, JSON.NET serializer won't know what object should it deserialize when serializer.DeserializeObject<obj> is called.
  2. With type name handling turned on, a fully qualified type name is used. So if you ever change the type name or namespace in the future, it won't be able to deserialize it back.

Usually when it comes to persistent serialization, I suggest people to write their own custom serializers (using eg. protobuf, FsPickler or whatever you want), to have full control of binary format and how it changes over time.

Yakimych commented 3 years ago

Ok, that makes sense. I guess the question for Akkling specifically is whether we can do something to improve "debuggability" and help the next person who runs into this issue, since currently it pretty much fails silently with default settings when using Persistence.Sqlite. Is there any way to write something to the logs on OnRecoveryFailure even for a typed actor. When I debug Akkling source code I can see that it is called, but nothing is logged.

Horusiath commented 3 years ago

Yes, I think we could do something about it.