Closed ryanjshaw closed 6 years ago
When ApplySnapshot is called you will get two events, RecovedSnapshot or SnapshotPersisted. Can't you make use of a switch statement here and do the right decision? Example: https://github.com/AsynkronIT/protoactor-dotnet/blob/dev/examples/Persistence/Persistence/Program.cs#L75
Thanks, I wasn't aware there were subclasses, that certainly works. (I've been using a Version field in the state to do the same thing.) But I'm still unclear as to why you would Apply a PersistedSnapshot? Wouldn't a separate callback be more appropritae?
It's very easy to shoot yourself in the foot and miss this as I did - I was processing 10s of thousands of events, so the issue was hidden in a lot of logging and idempotent processing meant duplicates were ignored. It was only when I dug deeper into the performance I realised what was happening. I don't think I'll be the last person to be bitten by this design decision. Proto.Actor is one of my favourite libraries, almost everything else is "obvious, simple, and smart by default" but this feels wrong. Just an opinion, though, happy to close if there is no consensus.
EDIT: My performance went from a few minutes for 1.5k Kafka messages to 3 seconds when I realised what was happening :) With optimisation I'm sure this can be much better.
In ApplyEvent() when KafkaMessage: add the kafkaMessage to my Processing list so that I can track the outstanding delivery Tell() the kafkaMessage to my queue subscriber handle ACKs and NACKs from my system components and update the Processing, Processed and Error lists accordingly
I'm a bit confused by this - does all this happen each time you reload events from the underlying storage? The event sourcing systems I've seen would normally separate out state-change from actions - https://softwareengineering.stackexchange.com/questions/354363/how-do-i-deal-with-side-effects-in-event-sourcing i.e. don't perform any side-effects in the replay stage, only set state.
https://github.com/gregoryyoung/m-r/blob/master/SimpleCQRS/Domain.cs is another example
Tell()s and ACKs/NACKs etc look like side effects. The same principal applies to loading a snapshot - it's just current state.
However... i agree it is confusing (to me at least) why we applySnapshot after saving it, as like you say, we have just passed it into the method
After looking at it I also thinks this is wrong. I'll make a PR to remove the ApplySnapshot call on these lines:
https://github.com/AsynkronIT/protoactor-dotnet/blob/dev/src/Proto.Persistence/Persistence.cs#L136
https://github.com/AsynkronIT/protoactor-dotnet/blob/dev/src/Proto.Persistence/Persistence.cs#L146
ApplySnapshot should only get called upon start/restart on the actor.
Why do Persistence.PersistEventAsync() and Persistence.PersistSnapshotAsync() both _applySnapshot() after calling snapshotStore.PersistSnapshotAsync()?
Aside from ApplySnapshot() being apparently redundant (I just created the snapshot from the current state, what’s there to apply?), if ApplySnapshot is expensive then performance is affected?
Is this by design? I guess it’s following the same pattern with event persistence, but snapshot and event persistence are two different things in my mind.
I’ll give a concrete (albeit heavily simplified) example. I am reading messages from a Kafka queue. I want to track which messages are still “Processing” (i.e. no ACK or NACK back from system components), and which are “Processed” or “Errors”. I want to be able to restart the service arbitrarily and pick up roughly from where I left off. I use both Event Sourcing and Snapshotting:
Now because ApplySnapshot() is always called when PersistSnapshotAsync() is called, I end up retelling my subscriber all Kafka messages in the Processing list every single time I create a Snapshot. This slows execution significantly, even if the idempotent processing on the subscriber side ensures no side-effects.
Am I doing this wrong?