elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
6.12k stars 1.11k forks source link

[BUG] Cron Expression causes workflow to fire twice in a row #5499

Closed MaxBrooks114 closed 1 week ago

MaxBrooks114 commented 1 month ago

Description

I have a workflow that is triggered by a cron expression to run once a day. For some reason it causes the workflow to be triggered twice immediately. The workflow then never completes and stays in the suspended state even though all the activities have been completed successfully.

Steps to Reproduce

  1. Detailed Steps: var cronTrigger = new Cron { CronExpression = new("0 30 16 ?"), CanStartWorkflow = true };

  2. Reproduction Rate: Happens every time.

Expected Behavior

Workflow triggers once and completes.

Actual Behavior

workflow triggers twice and enters suspended state.

Screenshots

image

Environment

Elsa 3.1.3, .net 8 on windows

Additional Context

it is a flowchart workflow.

glime-ai[bot] commented 1 month ago

Based on the information provided, it seems like there might be an issue with the cron job setup or the workflow configuration that is causing the workflow to trigger twice and not complete as expected. Here are a few steps you can take to troubleshoot and resolve the issue:

  1. Check the Cron Expression: Ensure that the cron expression "0 30 16 ?" is correct and corresponds to the intended schedule (once a day at 16:30). You can use online cron expression generators to validate the expression.

  2. Review Workflow Design: Examine the workflow design to ensure there are no loops or conditions that could cause the workflow to re-trigger itself.

  3. Check for Multiple Triggers: Verify that there is only one trigger configured to start the workflow and that there are no additional triggers that could cause the workflow to execute twice.

  4. Inspect Logs: Review the application logs for any errors or warnings that could provide more insight into why the workflow is entering a suspended state.

  5. Update Elsa Version: If you are not using the latest version of Elsa, consider updating to the latest stable release, as the issue might have been addressed in a newer version.

  6. Seek Support: If the issue persists, consider reaching out to the Elsa community or support channels for assistance. You can create an issue on the Elsa GitHub repository or ask a question on Stack Overflow with the elsa-workflows tag.

  7. Isolate the Issue: Try to isolate the issue by creating a simple workflow with the same cron trigger to see if the problem is specific to the workflow or the cron setup.

  8. Check for Concurrent Executions: Ensure that the workflow is not configured to allow concurrent executions, which could lead to multiple instances running at the same time.

  9. Workflow Suspension: Investigate why the workflow is being suspended. Workflows can be suspended due to unhandled exceptions or waiting for an event to occur. Make sure all activities are configured correctly and that the workflow is designed to complete.

Unfortunately, without access to the actual workflow definition and environment, it's challenging to provide a more specific solution. If you can provide more details or share the workflow definition, it would be helpful in diagnosing the issue further.

This comment was generated by Glime.

MaxBrooks114 commented 1 month ago

The suspended state is unrelated. However the duplicate triggering still happens, unknown why.

yutao815 commented 1 month ago

Same problem.I test it by Elsa.Samples.AspNet.Heartbeats.

sfmskywalker commented 1 month ago

@MaxBrooks114 @yutao815 Can you try with the latest 3.2.0-preview packages to see if the issue still exists?

sfmskywalker commented 1 month ago

I just tried to reproduce this issue, but it works without issue. I will close this as unreproducible, but if you can reproduce the issue with the latest preview packages in a sample project, then please feel free to attach a zip file and I'll check it out. Thank you.

MaxBrooks114 commented 1 month ago

@sfmskywalker I can confirm this bug still happens for me on preview version 3.2.0-sonar-cloud.1502. I will try to reproduce with a sample project when i have time

sfmskywalker commented 1 month ago

3.2.0-sonar-cloud.1502 Is quite an old build. You might want to try 3.2.0-preview.xxx.

MaxBrooks114 commented 1 month ago

@sfmskywalker same issue with 3.2.0-preview.1717 too.

sfmskywalker commented 1 month ago

Ok, thank you for checking. I look forward to the repro.

MaxBrooks114 commented 1 month ago

ElsaCronBugReplication.zip

Here is the bug replicated - it is using the latest 3.3 preview as well. I am using the quartz integration heartbeat workflow sample as a base, and then added a custom activity. It should be noted that your sample as well as mine do not actually use the quartz scheduler or elsa.quartz paackage, only elsa.scheduling.

I have replicated the bug using a specific type of cron expression similar to the ones where i encountered the bug originally, namely a once a day cron expression. Oddly, not only can I reproduce the bug where the job is being fired twice simultaneously, but also repeats every second like the original incarnation of the workflow, like some sort of merging of the two corn expressions. image

as you can see in the screenshot of the code the cron expression should only be firing once, but in the console below we have heartbeat style logs, with two every second

Edit: the cron expression should be firing once a second still. Not twice a second.

sfmskywalker commented 1 month ago

Thank you for the sample project, I'll take a look soon.

sfmskywalker commented 1 month ago

I can confirm the issue; specifically, it happens the first time when triggers don't yet exist in the database.

4lexKislitsyn commented 1 month ago

I believe CreateSchedulesHostedService cause this behaviour.

DefaultWorkflowDefinitionStorePopulator call GetWorkflowsAsync of CLR Provider. Later IndexTriggersAsync method is used to store triggers and send notifications. ScheduleWorkflows class handles notification and calls ScheduleAsync method of ITriggerScheduler. After all this steps CreateSchedulesHostedService is trying to get all stored triggers and schedule it again.

Because IScheduler is scoped, it can't detect duplicates. If IScheduler, CronosCronParser and Func<IServiceProvider, ICronParser> are registered as singleton scheduler will detect duplicates and replace triggers.

Possible solution will contain changes to SchedullingFeature.cs:

            .AddSingleton<IScheduler, LocalScheduler>()
            ...
            .AddSingleton<CronosCronParser>()
            .AddSingleton(CronParser)
MaxBrooks114 commented 1 month ago

@4lexKislitsyn I can confirm this fixed the issue, but you need to make ISystemClock feature a singleton as well. Should I make a pr or will you?

@sfmskywalker please confirm this can be a possible solution.

4lexKislitsyn commented 1 month ago

@MaxBrooks114 I can create a PR after @sfmskywalker confirm that this solution will not breake something else. Are there other features that should be revised beside ISystemClock?

MaxBrooks114 commented 4 weeks ago

@4lexKislitsyn not as far as I'm aware, it worked fine with your suggested fixes and the systemclock.

4lexKislitsyn commented 4 weeks ago

I looked at systemclock feature and it's already registered as singleton in SystemClockFeature.cs.

Also IServiceProvider in LocalScheduler is used for creating of ScheduledCronTask, ScheduledRecurringTask, ScheduledSpecificInstantTask and this classes have IServiceScopeFactory to initialize objects with scoped or transient lifetime. So I think there will not be any issue with it.

I will open PR in few minutes.

dwoldo commented 4 weeks ago

Is this a dupe? https://github.com/elsa-workflows/elsa-core/issues/5600

sfmskywalker commented 4 weeks ago

@dwoldo Yes, but it also mentions another issue where unpublishing a workflow does not seem to unschedule the timer trigger.

sfmskywalker commented 1 week ago

Fixed via e5ba9a8a9d31541ad24fa0ea0853cce5c4cc9601