FrodeRanders / muprocessmanager

A Saga execution coordinator implementing a micro-process manager
Apache License 2.0
15 stars 0 forks source link

Problems Clustering #1

Open julianoksoares opened 6 years ago

julianoksoares commented 6 years ago

We are studying the "muprocessmanage" framework for a project where the requirements for using the SAGA concept are needed. When used in a single instance of a microservice the behavior is as expected. But when other instances of the same microservice are started, sharing the same database (Postgres) eventually has more than one undo execution in the scenario where a problem occurred when executing the backward method. In this scenario we have several instances of the same microservice, each one starting its "MuProcessManager". Is it possible to use the framework in this use case? Is it possible to use Quartz in cluster mode as a scheduler to avoid this problem?

FrodeRanders commented 6 years ago

Hi Juliano.

Are we talking about the synchronous “undo”, i.e. when processing a request and after encountering an error when undoing the individual transactions in the process, or the asynchronous “undo”, i.e. when the background machinery picks up a previously failed attempt at undoing a transaction?

I can see there being trouble when running multiple asynchronous background tasks for handling failure modes at the moment — there should really only be one of those — while there should be possible to have multiple processes, i.e. instances of MuProcessManager — if those did not initiate the background task.

I will fix this and introduce a way to inhibit the asynchronous background tasks when creating a MuProcessManager. The idea I have is to make this configurable, so that starting multiple instances of MuProcessManager will not start multiple instances of the asynchronous background task. As an effect, one such instance must be explicitly told to initiate the asynchronous background task.

Would this solve your problem? I am quite sure, given that you are having problem with multiple instances of the asynchronous background task. If this is not the case and you are in fact experiencing problem with synchronous undos, then this is an error and I would then need some more input :)

On a different note, we are looking into running MuProcessManager in OpenShift (in the sibling restitch project) and would then want to run several PODs. At the moment, this is feasible if all MuProcessManager:s share the same database, but we would really want to have separate databases as well. My current thought is using Scylladb for this :)

Regards, Frode

On 2018-06-22, at 18:51, Juliano Kuhn Soares notifications@github.com wrote:

We are studying the "muprocessmanage" framework for a project where the requirements for using the SAGA concept are needed. When used in a single instance of a microservice the behavior is as expected. But when other instances of the same microservice are started, sharing the same database (Postgres) eventually has more than one undo execution in the scenario where a problem occurred when executing the backward method. In this scenario we have several instances of the same microservice, each one starting its "MuProcessManager". Is it possible to use the framework in this use case? Is it possible to use Quartz in cluster mode as a scheduler to avoid this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FrodeRanders/muprocessmanager/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVYuXW_BFVmHHDTAOjo_qLlmEb7QJ8Xks5t_SCVgaJpZM4U0HcV.

FrodeRanders commented 6 years ago

Easier still would be not to call MuProcessManager::start on all instances of your process manager.

You do not have to call start on all instances of MuProcessManager, since the only effect it has is starting the asynchronous background task. You can still use such a process manager instance for handling synchronous calls, i.e. running processes, and undoing transactions upon failure. It is only if you encounter failures to undo transactions (and these need to be re-tried later) or if the server process dies, that you need the fallback provided by the asynchronous background task.

Reserve one MuProcessManager instance for running the asynchronous background task, which could be run in a dedicated environment and without handling calls if you like.

On 2018-06-23, at 13:19, Frode Randers frode.randers@gmail.com wrote:

Hi Juliano.

Are we talking about the synchronous “undo”, i.e. when processing a request and after encountering an error when undoing the individual transactions in the process, or the asynchronous “undo”, i.e. when the background machinery picks up a previously failed attempt at undoing a transaction?

I can see there being trouble when running multiple asynchronous background tasks for handling failure modes at the moment — there should really only be one of those — while there should be possible to have multiple processes, i.e. instances of MuProcessManager — if those did not initiate the background task.

I will fix this and introduce a way to inhibit the asynchronous background tasks when creating a MuProcessManager. The idea I have is to make this configurable, so that starting multiple instances of MuProcessManager will not start multiple instances of the asynchronous background task. As an effect, one such instance must be explicitly told to initiate the asynchronous background task.

Would this solve your problem? I am quite sure, given that you are having problem with multiple instances of the asynchronous background task. If this is not the case and you are in fact experiencing problem with synchronous undos, then this is an error and I would then need some more input :)

On a different note, we are looking into running MuProcessManager in OpenShift (in the sibling restitch project) and would then want to run several PODs. At the moment, this is feasible if all MuProcessManager:s share the same database, but we would really want to have separate databases as well. My current though is using Scylladb for this :)

Regards, Frode

On 2018-06-22, at 18:51, Juliano Kuhn Soares <notifications@github.com mailto:notifications@github.com> wrote:

We are studying the "muprocessmanage" framework for a project where the requirements for using the SAGA concept are needed. When used in a single instance of a microservice the behavior is as expected. But when other instances of the same microservice are started, sharing the same database (Postgres) eventually has more than one undo execution in the scenario where a problem occurred when executing the backward method. In this scenario we have several instances of the same microservice, each one starting its "MuProcessManager". Is it possible to use the framework in this use case? Is it possible to use Quartz in cluster mode as a scheduler to avoid this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FrodeRanders/muprocessmanager/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVYuXW_BFVmHHDTAOjo_qLlmEb7QJ8Xks5t_SCVgaJpZM4U0HcV.

FrodeRanders commented 6 years ago

That said, I am still interested in exactly what problems you saw when running multiple instances of MuProcessManager -- all with the background activities running -- since I was particularly looking out for this :)

At heavy load, which may happen immediately after startup -- if there is a lot of stuff laying around from earlier processes -- the thread pool running tasks may be momentarily overloaded (which is fine per se), in which case we want to postpone subsequent asynchronous tasks until the thread pool is catching up again. I think this should be compatible with Quartz on a per node basis, but I'll have to look into it. Thanks for the suggestion!

julianoksoares commented 6 years ago

Thanks for the feedback!! The problem actually only occurs when you need to perform asynchronous undo tasks. In this case, each MuProcessManager executes the same undo call by placing multiple calls to the undo task for the same correlationID. I will try to validate the suggestion to use only one instance with the call to the "start", for this it will be necessary to create a centralized control to avoid that other instances are initiated with this command. In any case, it would be interesting if the component itself took care of this, preventing parallel controls from being performed. For now I thank you immensely for your attention. Regards,

Juliano