Closed Aaronontheweb closed 1 year ago
I should point out: these issues and the bugs I'm fixing in #3744 are one of the reasons why Akka.Cluster.Sharding has been stuck in beta for so long. Coordinator fail-over and recovery is impossible without reliable serialization and deserialization of IActoRef
s, which is made much more complicated by having every Akka.Persistence plugin do its own thing.
Example of stuff that should be deleted:
Allowing the plugin to OVERRIDE the akka.actor.serialization-binding
s with some kind of external type hint is crazy - there's no way for the serialization system to know what to do based on its own configuration in this case. That's gone in 1.5.0 and it may not even make it into 1.4.0 since getting Akka.Cluster.Sharding out of beta requires that IActorRef
serialization work properly.
On top of that, having to directly expose both the manifest data AND the serializer ID directly into the schema of the Akka.Persistence plugins has been demonstrably brittle over the years - none of that would have even been necessary had we stuck with the original Akka.Persistence design in the first place, since that information is all neatly encapsulated internally.
Regarding code snippet you've shown - I think that nowadays we have a method inside Serialization
class itself, that does exactly that.
IMHO:
IActorRef
)?It's not just that we have a new method inside the 'Serialization' class - it's that we actively bypass the built-in persistence serializers so we never use the MessageSerializer or SnapshotSerializer formats, which (edit: Akka on the JVM) still use.
This wasn't an accident - this was a conscious decision we made around Akka.NET 1.3. As a result, many serializers don't properly support manifest serialization correctly, none that I know of support IActorRef serialization, and we have to actively evolve and migrate our schema for plugins like SQL. None of this would be an issue if we'd just used the envelopes - Protobuf does a much better job tolerating changes in versioning of Akka.Persistence's feature set and standardizing / eliminating errors from the serialization process than our hand rolled process ever could.
Sent from my iPhone
On May 27, 2019, at 3:05 AM, Bartosz Sypytkowski notifications@github.com wrote:
Regarding code snippet you've shown - I think that nowadays we have a method inside Serialization class itself, that does exactly that.
IMHO:
We shouldn't be too eager to ditch existing mechanisms. Maybe it's possible to simply expose API used to serialize/deserialize akka specific types (like IActorRef)? I'd take a look at how JVM akka resolves serialization in akka-typed (which AFAIK is a little different from. Also starting from v2.6 its changing default serialization to Jackson (see the issue). Maybe it's something worthwhile to learn from their approach. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Another good example here: https://github.com/AkkaNetContrib/Akka.Persistence.MongoDB/pull/42 and here https://github.com/AkkaNetContrib/Akka.Persistence.MongoDB/issues/26 - Akka.Persistence.MongoDb couldn't support the built-in Cluster.Sharding serializers by default, since it was doing its own thing with serialization rather than using our Protobuf envelopes.
To be clear though, @Horusiath is absolutely right about this:
We shouldn't be too eager to ditch existing mechanisms.
The existing mechanisms are, by any objective standard, bad. But thousands of users have potentially years worth of data stored in them. The only way we could make this situation worse is to strand those users so they can't upgrade onto newer versions of Akka.NET without having to come up with their own hand-rolled ETL tools for Akka.Persistence data. I'd consider that an abdication of responsibility on the part of the Akka.NET project if we allowed that to happen.
Therefore, I'm going to learn towards, to the extent that it is both feasible and possible, the following upgrade path:
This may be a crazy idea, but: what about making IActorRef
inherit ISerializable
interface?
IActorRef
serialization/deserialization in situations we know about.ISerializable
that way. Of course the problem would be to resolve current ActorSystem, but I think it's doable the same way we deal with current actor cell, using ThreadStatic.
I am currently researching a method to persist the serialization data in a message.
The idea is to have a special payload type that is normally serialized but it needs to be deserialized on request (user space) or specialy handled like in ActorBase.Receive/Unhandled
Maybe this payload-type would solve the this Akka.Persistence issue too. Instead of preventing to serialization in the persistence plugins, they could be extended to use the pre-serialized content.
It would resolve some other issues:
Related: https://github.com/Horusiath/Akka.Persistence.Reminders/issues/6
No double serialization as a feature-bug
It's not a bug - you have to have a standard payload definition to determine what the user-defined message types are. The problem with Akka.Persistence is that we threw the standard Protobuf payload definition in the trash and forced all of that data to get stuck directly into the Akka.Persistence implementations themselves, which makes them very difficult to upgrade and has caused a lot of problems historically.
This is a return to Keep it Simple(tm) - we're going to do what the JVM has been doing and let the Protobuf envelope, which can be changed easily without introducing these types of problems, do its job.
Can this Protobuf envelope then be so generic that it can be used other places?
Example: NodaA sends a wrap message with a "Protobuf envelope payload" field to NodeB NodeB receivies the wrap message and sends the "Protobuf envelope payload" to NodeC NodeC receives the "Protobuf envelope payload" and the Actor tries to handle it serialized. If the receiving actor is not handling the "Protobuf envelope payload" directly the AroundReceive Method tries to deserialize it and on success try to handle the deserialized message.
We already have it:
It's just not being enforced by the underlying Akka.Persistence implementation.
In Akka.Remote we do something similar:
The envelope formats are only supposed to contain:
In Akka.Remote we also have to include the sender / receiver information; in Akka.Persistence we have to include the sequence number and entity id.
thx, the protobuf types i didnt see.
I made a sample implementation snipset in https://github.com/Horusiath/Akka.Persistence.Reminders/issues/6#issuecomment-531583713 Maybe its better to see there what i mean.
My point is that there is currently no pass-through transparent serialization feature and every "auto" deserialiazion of messages happens out of the scope of the user. Including the error processing in the deserialiazion itself (Node Disassociated)
The same goes for Akka.Persistence. there is currently no way to persist an already serialized message as is. Every payload gets serialized and wrapped into a PersistentPayload again. https://github.com/akkadotnet/akka.net/blob/8f9d19fa481599056e36ad1d817c2d4149a38198/src/core/Akka.Persistence/Serialization/PersistenceMessageSerializer.cs#L58
Maybe there is just a pass-thru Serializer missing, that takes the manifest, data-bytes and serializerId values directly from a serialized-payload message.
´´´ akka.persistence.journal.eventstore { event-adapters { passthru = "PassThruSerializer, Akka.Maybe" } event-adapter-bindings { "Akka.Persistence.Serialization.PersistentPayload, Akka.Persistence" = passthru "ISerialzedPayload, Akka.Maybe" = passthru } } ´´´
@Aaronontheweb Just to make sure I got it right: the payload (user provided message) inside the PersistentPayload
WON'T be serialized anymore using a user configured serializer, if provided? Or are we just talking about enforcing the default serializer (Google Protobuf) only on the envelope (PersistentMessage
) but still being able to serialize the payload with whatever the user chooses to (i.e., MessagePack)?
So yet another thing I've discovered while working on trying to get Akka.Persistence.Azure to do the right thing: https://github.com/akkadotnet/akka.net/issues/3965
There isn't a single plugin, as far as I'm aware, that actually passes our own serialization TCK that has been in-place since Akka.NET v1.3.0. This includes the SQL plugins and others. This is because our entire serialization approach isn't standardized across plugins and it puts a high burden onto individual plugin owners to implement them correctly, which our own team can't even do correctly, so it seems.
Edit: and it doesn't help that our TCK is apparently implemented incorrectly and no one has bothered to notice.
Just to make sure I got it right: the payload (user provided message) inside the PersistentPayload WON'T be serialized anymore using a user configured serializer, if provided? Or are we just talking about enforcing the default serializer (Google Protobuf) only on the envelope (PersistentMessage) but still being able to serialize the payload with whatever the user chooses to (i.e., MessagePack)?
All user-defined messages will still be serialized using their configured serializers. It's just that the content being saved into Akka.Persistence itself will all be encoded inside a protobuf envelope that is standardized by Akka.Persistence, very similar to how we do it for Akka.Remote.
For a proof of concept of how this will work, see https://github.com/akkadotnet/Akka.Persistence.MongoDB/pull/71- implementing this in Akka.Persistence.MongoDb since the legacy "object" serialization is totally incompatible with AtLeastOnceDelivery
actors and Akka.Cluster.Sharding as a result of not using any of our binary serialization infrastructure.
how to use Text/Json serialization in debug mode? I would need to see what gets serialized in the store for PersistentActors.
The main issue is that the journal itself needs currently to serialize/deserialize.
I still thing a serialized-message type for serialization would makes more sens. This type would tell the Journal that the message is pre-serialized (binary or text) -) On write an upgraded-journal would then take the serialized-data as is from the message. -) On write an old-journal would serialize the message as in the workaround -) On read an upgraded-journal would then emit only the serialized-message type -) On read an old-journal would still emit the deserialized messages.
A new layer/component would be required that: 1) Pre-write serialize every message to the serialized-message type for the journal Normal serialization for PersistentActors, Special serialization for Sharding or AtLeastOnceDelivery Passthru for serialized-message type
2) Post-read every message from the journal On serialized-message it will deserialize the message On unknwon message type it will passthru the message No-deserialize option to emit only serialized-message
This would help in other issues like: 1) persistence-proxy of pre-serialized messages. a proxy-node should not need to bind to every assembly and be up-to-date. 2) relying of serialized message. currently there is no why to tell a remote generic-service-actor a payload as a post-back There is no why to rely a payload from nodeA via NodeB to NodeC without the need for NodeB to bind every assembly of the cluster/roles.
//A simple RelyService is already impossible
nodeBService.Tell(new NodeBServiceCommand.RelyTo(Self, new NodeAMessage.Hello("hallo world"));
3) there will be some performance improvement to skip re-serialization in Remote->Remote and Remote->Persistence and Persistence ->Persistence and Persistence ->Remote and Broadcasts
how to use Text/Json serialization in debug mode?
Easiest way to do this would be via an EventAdapter
, in the event that you needed to debug things at that level, which most users shouldn't need to do.
The raison d'etre for doing this is twofold:
Building an entire second system for doing this with text, solely so people can read the messages that are being serialized in plain text through a third party database tool, has been a total disaster from a maintainability, version-tolerance, and implementation success standpoint.
A small number of users involved in doing the maintenance work, myself among them, have had to bear significant costs fixing these incompatibilities when we could be doing more useful things like performance fixes. I've personally spent on the order of 200-300 hours working on these problems in 2019. We're done doing that - forever.
If having Akka.Persistence data stored in human-readable format is a must-have for you, fork a plugin and bear the costs of maintaining it yourselves. The OSS organization isn't going to do that for free any more. It's an expensive drag on resources and time.
A new layer/component would be required that:
Pre-write serialize every message to the serialized-message type for the journal Normal serialization for PersistentActors, Special serialization for Sharding or AtLeastOnceDelivery Passthru for serialized-message type
Post-read every message from the journal On serialized-message it will deserialize the message On unknwon message type it will passthru the message No-deserialize option to emit only serialized-message
This is what EventAdapter
s are supposed to do, but the idea is that your serialization format should be stable and reproducible going all the way back to your oldest stored data. This type of event adaptation / change should occur solely with objects, not with bytes. Introducing yet another abstraction to assist with this will make Akka.Peristence even more expensive to implement and maintain.
I'm not trying to discourage innovation or creativity here - simply stating an economic truth: our defaults are bad, usually aren't implemented correctly, and this results in both bad experiences for end-users and a ton of maintenance work on both the part of the OSS community and Petabridge.
The lesser of two evils is something that actually works reliably and correctly, most of the time, even if you can't easily read the output data in something like SMSS or whatever your chosen database platform is. The answer isn't to do the same thing we did before, but with even more abstractions, and hope it works - it's to follow a pattern that works via standardization of the serialization process.
If you want to solve the "data readability" problem, do that on the read side with an independent Akka.Persistence reader tool, or even better - use an EventAdapter or a logger to solve that issue by writing the output into human-readable form elsewhere, instead of trying to do it with the application data itself.
Thank you for answering my question.
Yes, text representation is optional, i am using EventStore and looking over the WebInterface into the streams for debug. It is nice, but of course not required.
For the MongoDb workaround to add a NotSupportedException into the Text case would make some sens. The Binary and Text case should have the same behavior or else only binary-case should work in write mode.
What about the proposal to extract the serialization logic out from the persistent journal/snapshot-store completely? The current journal serialization-logic implantation of plugins can then stay as is and can later be updated.
It is currently not possible to pre-serialize with a poor EventAdapter, there is no serialized-message-type that a journal knows about. The simplest form would be to transform into a byte[] array but then the metadata would be lost. The next form would be to transform into a protobuf PersistentPayload message of the Akka.Remoting but then it would get double serialized and the metadata would be lost too.
I already tried and spend only 10-20h on it, it is more or less working for me in Akka.Remoting and Persistent as a Payload-callback. The fields what a serialized-message would need are:
sealed class SerializedPayload {
int SerializerId;
string Manifest;
byte[] Data;
byte[] HashKey;
}
The HashKey is required for the ConsistentHashRouter and it would be better to change it to an integer type, but for that a Pre-Hashed return option inside IConsistentHashable would be required. I think i made an PR for it some weeks ago.
To have this message type in akka.dll and as an opt-in feature for Akka.Persistentce.Journals and Remoting would bring some value.
An updated journal would then only need to take the SerializedPayload
as is and write it to the database and instead of deserializing
the message internally it would return the SerializedPayload.
and the DefaultEventAdapter would then deserialize it.
Maybe i am miss taken, but i don't see that this solution would require much more afford then to update every persistent plugin up front.
Do you have an example or some code we could look at? That might help me understand what you're trying to suggest better.
So, gonna leave my comments here, since I've been neck deep in different implementations on the JVM Side while looking at Tag tables in SQL.
we're going to do what the JVM has been doing and let the Protobuf envelope, which can be changed easily without introducing these types of problems, do its job.
In persistence-jdbc, they're flattening everything down. Now, they absolutely MUST respect the object's defined serializer, but they do not serialize PersistentRepr
. (They also make sure to respect 'manifest' vs 'serializer manifest' now, something I don't think many plugins do well now, not sure they did when I originally ported.) Similar in akka-persistence-cassandra.
(BTW we probably should do this with tag table changes... yay for more compatibility flags!)
Meaning PersistentRepresentation is no longer part of the payload?
Meaning PersistentRepresentation is no longer part of the payload?
That would seem to be the case. Here's the actual serialize in persistence-cassandra. They grab the payload from the persistent repr, and as with persistence-jdbc they are unrolling tagged payloads.
Truth be told, I think this issue is resolved - over the years we've made a lot of progress on making sure that all Akka.Persistence plugins properly support SerializerWithStringManifest
and in most cases we've done that without the use of Persistent
envelope types. I like the idea of avoiding "double serialization" for plugins where it's not currently needed - deserializing the PersistentRepresentation
envelope and then deserializing the message within.
Version: 1.3.13 and later.
Without putting too fine a polish in it, it's clear to me after working on https://github.com/akkadotnet/akka.net/pull/3744 that the current implementation of Akka.Persistence serialization in Akka.NET is unsustainable and that a serious error in judgement was made when we decided to allow individual Akka.Persistence plugins to choose their own "native-friendly" serialization instead of relying on the Protobuf envelopes that Akka.Persistence provides by default. This is especially true of our Akka.Persistence.Sql plugins.
As the new Akka.Persistence.TCK specifications will make perfectly clear in Akka.NET v1.4.0+, it is very difficult for Akka.Persistence plugin implementations to correctly support things like
IActorRef
serialization since that requires access to some internal fields and properties, such asCurrentTransportInformation
that aren't meant to be exposed outside of Akka.NET for safety reasons.Most journal and snapshot store plugins will fail the new serialization checks, because the current design of many Akka.Persistence plugins requires them to come up with their own techniques for serialization, virtually none of which are consistent with Akka.NET best practices.
Going forward, we should to streamline everything on top of the built-in Akka.Persistence
MessageSerializer
andSnapshotSerializer
implementations so we can correctly support Akka.Persistence features like multi-tenant journals, snapshot stores, and more.This is too big and drastic of a change for v1.4.0, but I'd suggest we put it on the table for 1.5.0.