Open AndrewDryga opened 6 years ago
The simplest solution could be to use the :disk_log
module - it powers mnesia transactions.
@michalmuskala I've read through :disk_log
and it looks like it's not the best fit for this problem:
There is no way to clean up old logs, we don't want them to grow infinitely and it's safe to remove effects after Sage execution or compensation is completed. Otherwise, we would have lots of log entries which are never used.
Running it in a distributed mode is not that trivial, even for this module it's recommended to have a wrapper process on each node that sync local disk logs.
We need something with fast writes and deletes by sacrificing reads, like Log-Structured Merge Trees. But I'm not aware of good implementation which would not bring too much maintenance burden.
Any ideas?
Implementation can work something like this:
Sage.Executor
should provide a function to run all compensations for a given Sage from a saved checkpointProbably we should remove compensation error handler since it looks obsolete with an ability to persist and retry compensations after a critical failure.
Questions:
I think we should keep it all local - at least in the first iteration. I don't think it's the distribution of the log (or transaction persistence) would be actually that useful.
The simple adapter could just use a dets table - it might not be the best tool for the job, but it's a no-dependencies solution that should allow validating the idea.
Lastly, saving funs might be a problem - any code change would basically make it irrecoverable (since serialisation for funs includes the md5 hash of the module). Maybe the persistence feature should only work with mfa tuples?
@michalmuskala Sage which relies on MFA instead of callbacks in a form &do_something/3
is so much harder to read :(. I'll think more about it.
If we further extend persistence with external transaction ID we would be able to trigger recovery externally by executing the same Sage with the same ID and attributes, which would provide idempotency feature described #2.
We can use LSM DB from NHS https://github.com/martinsumner/leveled
I think we should keep it all local - at least in the first iteration. I don't think it's the distribution of the log (or transaction persistence) would be actually that useful.
I think this is fine for an initial pass but I'm not sure this is a correct default to have long term. It would be problematic for our uses. But as long as persistence is behind an adapter as described everything should be fine since we can swap in our own persistence.
Thanks for this library and the thought you're putting into it :+1:.
@AndrewDryga, is there any update on this issue? Thanks!
Hey @deepankar-j. I'm not spending much time adding features to Sage as it fits our use cases at a time as it is now. So if anyone wants - this one is open for grabs.
Thanks @AndrewDryga. Out of curiosity, how do you handle deployments for your system while a saga is the middle of executing.
We have a three-node cluster on which we do a rolling update. This is why I'm interested in the execution log and saga recovery. However, I imagine that updating in the middle of a saga execution might not be as big of an issue for blue-green deployments.
I think the goal is to handle process crash - not just node failure...
In my opinion this shouldn't be special to sage. For example, the whole context can be rehydrated before calling sage execution (or at the first step of sage) and the all steps should be able to handle that.
In my case we keep everything in the database, and each sage step are idempotent with the information available in database - e.g. when we know it's already done (e.g. an attr for external resource is already set ) then execution step would be no-op.
@deepankar-j we currently don't handle it as it's pretty rare for our system. It's the same as not handling node crash before Sage was introduced, but with Sage's semantics code is much easier to understand.
Also, if your functions are idempotent (eg. you persist some state between steps with an upsert operation) the saga would be persistent without any handling from the library.
Ha, I'm implementing my own distributed transaction system and started to think "I have to be reinventing the wheel here", so I found this cool project!
My system runs in Kubernetes so pods are very frequently killed, restarted, deleted, etc, so I must persist my transaction state. Currently my system uses Redis and Jetstream. Write the transaction to Redis, enqueue a Jetstream message, and have consumers process the message which update the transaction state in Redis. Notification of transaction/step values is all done with pub/sub (which comes built-in with Jetstream).
Jetstream handles work-queue type stuff... retries, acks, etc.
Each step in a transaction must be idempotent, because the system has at-least-once delivery semantics.
Anyway, long winded way to say... maybe there should be a behaviour that specifies how to store/load transaction state, that way people can use whatever storage system they want?
@cjbottaro there is no interface right now but the idea was to introduce such behavior and then move the state into a separate module that would be a default implementation. If somebody has a practical need for this a PR would be welcome.
Keep in mind that persisting state is not as easy as it sounds, for example, you need some sort of two-phase commits for every execution step to recover from failure when execution has finished but the node is restarted before Sage persisted the state.
Keep in mind that persisting state is not as easy as it sounds, for example, you need some sort of two-phase commits
Yes, but even then, I think 2-phase commits relies on something being 100% reliable (i.e. the file system). Which doesn't jive with networked/distributed systems.
This problem is not fun at all. https://en.wikipedia.org/wiki/Two_Generals%27_Problem
I'd love to brainstorm on this. My current thoughts are around "eventual consistency" systems... but that depends on detecting inconsistencies.
cc @AndrewDryga
@cjbottaro the thing I would focus on would be making sure that when your node receives SIGTERM it can stop gracefully, you should cut the traffic to it and give it time to finish processing sagas.
This way your k8s deployment would be a bit slower but you won't have sagas to be stopped mid-way (unless node crashes or OOM which is rate). I feel like this approach would be more practical than solving distributed systems issues.
We can add a persistence adapter for Sage. The main goal of implementing it is to let users recover after node failures, making it possible to get better compensation guarantees. As I see it we need:
This mechanism should be prone to network errors, eg. we should persist a state before executing a transaction, so that even if the node is restarted, compensation (even trough without knowing anything about the effect) would still start, potentially on another node.