elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
6.07k stars 1.1k forks source link

[BUG] Duplicate executions when using the default scheduling provider in single-node environment #5600

Open MariusVuscanNx opened 4 weeks ago

MariusVuscanNx commented 4 weeks ago

Description

Configuring the scheduling activities with the default provider (neither quartz nor hangfire), in a single-node environment has a few issues like duplicate executions when updating an existing workflow and errors thrown when unpolishing a workflow.

Steps to Reproduce

Scenario 1: Unpublishing and publishing again will cause duplicate executions. Basically the first triggers will not be removed.

Scenario 2: Changing the value of the cron will have the same effect as scenario 2.

Scenario 3: Unpublishing a workflow will stop the timer but it is done by accident. The application will keep trying to execute the workflow but since it is unpublished it will throw this error:

fail: Elsa.Mediator.HostedServices.BackgroundCommandSenderHostedService[0]
      An unhandled exception occured while processing the queue
      System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
       ---> System.AggregateException: One or more errors occurred. (Specified workflow definition and version does not exist)
       ---> System.Exception: Specified workflow definition and version does not exist
         at Elsa.Workflows.Runtime.Services.DefaultWorkflowRuntime.CreateWorkflowHostAsync(String definitionId, StartWorkflowRuntimeParams options, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.Services.DefaultWorkflowRuntime.StartWorkflowAsync(String definitionId, StartWorkflowRuntimeParams options)
         at Elsa.Workflows.Runtime.Services.DefaultWorkflowRuntime.TryStartWorkflowAsync(String definitionId, StartWorkflowRuntimeParams options)
         at Elsa.Workflows.Runtime.Handlers.DispatchWorkflowCommandHandler.HandleAsync(DispatchWorkflowDefinitionCommand command, CancellationToken cancellationToken)
         at Elsa.Mediator.CommandStrategies.DefaultStrategy.ExecuteAsync[TResult](CommandStrategyContext context)
         --- End of inner exception stack trace ---
         at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
         at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
         at InvokeStub_Task`1.get_Result(Object, Object, IntPtr*)
         at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
         --- End of inner exception stack trace ---
         at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
         at System.Reflection.PropertyInfo.GetValue(Object obj)
         at Elsa.Mediator.Middleware.Command.Components.CommandHandlerInvokerMiddleware.InvokeAsync(CommandContext context)
         at Elsa.Mediator.Middleware.Command.CommandPipeline.InvokeAsync(CommandContext context)
         at Elsa.Mediator.Services.DefaultMediator.SendAsync(ICommand command, ICommandStrategy strategy, CancellationToken cancellationToken)
         at Elsa.Mediator.HostedServices.BackgroundCommandSenderHostedService.ReadOutputAsync(Channel`1 output, CancellationToken cancellationToken)

The issues can be reproduced every time.

The startup configuration: .UseScheduling()

Cron expressions used:

Expected Behavior

The expected behaviors are:

Workaround

Using the quartz provider is working as expected, without the issues mentioned.

Environment

glime-ai[bot] commented 4 weeks ago

Based on the provided context, it seems that you are encountering issues with the default scheduling provider in Elsa 3.2.0-preview.1616. Here are the steps you can take to address the issues you've described:

  1. Duplicate Executions on Republishing: When you republish a workflow, the existing triggers should be removed to prevent duplicate executions. This might be a bug in the current version you are using. You should check if there are any open issues on the Elsa GitHub repository or consider reporting this as a new issue.

  2. Updating Cron Value: Similar to the first issue, when you update the cron value, the old triggers should be removed or overridden by the new ones. If this is not happening, it could also be a bug. Again, checking the GitHub issues or reporting a new one would be the recommended course of action.

  3. Errors on Unpublishing a Workflow: If unpublishing a workflow is causing errors due to attempts to execute the workflow, this is likely a bug. Ensure that your workflow cancellation and cleanup logic is correctly implemented. If the issue persists, report it on GitHub.

  4. Workaround: You mentioned that using the Quartz provider works as expected. If the default provider is not meeting your needs due to these issues, consider switching to Quartz.NET or Hangfire as a more stable alternative until the issues with the default provider are resolved.

  5. Environment: Since you are using a preview version of Elsa, it is possible that you may encounter bugs that have not been identified or fixed yet. Always ensure you are using the latest stable version, or if you are using a preview version, keep an eye on the release notes for bug fixes related to your issues.

  6. Mass Transit Disabled: If you have disabled Mass Transit, make sure that any related configuration or dependencies are also correctly adjusted to avoid conflicts or issues with the workflow execution.

  7. Default Runtime: If you are using the default runtime and encountering these issues, consider exploring the distributed runtime options that Elsa offers, such as using Proto.Actor for a distributed, lock-free workflow execution across multiple nodes.

If you need further assistance, you can reach out to the Elsa community or maintainers through their GitHub repository or official communication channels.

This comment was generated by Glime.