conductor-oss / conductor

Conductor is an event driven orchestration platform
https://conductor-oss.org
Apache License 2.0
17.84k stars 457 forks source link

Human task is scheduled twice even if I set redis lock #53

Open JenniferZh90 opened 8 months ago

JenniferZh90 commented 8 months ago

Describe the bug I set up a workflow with several simple tasks and one human task. I also have several workers pull and update simple tasks and 1 worker update human task. However, I found that human task is scheduled twice after previous simple task is finished

Details Conductor version: 3.16.0 Persistence implementation: Postgres Queue implementation: Redis Lock: Redis Workflow definition: see attachment workflow_definition.json workflow_definition.json

Task definition: see attachment task_definition.json tasks_definition.json

Below is the result returned by api/workflow/8f4d7300-5dd8-42dd-a58b-aadbc68db157?includeTasks=true

workflow_result.json

Below is the properties we use: conductor-config.properties.log

Below is the env setup: 1)Redis(1primary + 1 replica): AWS elasticache: cache.t4g.micro 2)postgreSQL(1w + 1 ro): AWS aurora RDS: db.t4g.medium 3)conductor service(2 pod/replica): AWS ec2: m7a.xlarge

To Reproduce Steps to reproduce the behavior:

  1. Create workflow definition

  2. Create task definitions

  3. Start 50 workers for each of the simple tasks with poll time 200ms

  4. Start 1 process to update human task if there is a human in_progress in current instance. Once worker updates task "saveDbWithWorkflowDummy_0" with finish status, then insert the workflow instance to local memory table. There will be a process check the table every 200ms to get the instance id out and check if there is any human task scheduled. If yes, update the human task with complete status and delete the instance from table, otherwise wait next check poll.

  5. Start 2000 workflow instances(If it doesn't work, after this finishes, start another 2000, usually 4 round can reproduce the issue)

  6. See duplicate human tasks in screenshot

Expected behavior Human task is only scheduled once

Screenshots If applicable, add screenshots to help explain your problem.

duplicate_scheduled_human_task duplicate_scheduled_human_task2

Additional context Add any other context about the problem here.

Dyson-Ido commented 8 months ago

@JenniferZh90 , at lease it seems the human task in your workflow working? In my workflow, the huamtask stuck , can not updated as finished. Could you share how do you update human task to finish it? Thank you!

JenniferZh90 commented 8 months ago

@Dyson-Ido , I just first get human task id then update it with complete status: POST "http://localhost:8080/api/tasks" with body: { "taskId": , "workflowInstanceId": , "status": "COMPLETED", "outputData" => }

Dyson-Ido commented 8 months ago

@JenniferZh90 , It's actually updated as COMPLETED status? I mean the human task is completed?

JenniferZh90 commented 8 months ago

@Dyson-Ido Yes. All tasks are marked as "COMPLETED" finally if they are completed/finished

ab48917 commented 4 months ago

@v1r3n I did replicate this issue and It seems a bug in system. With conductor client v3.9, this was occurring few times with 2-4k of load while with conductor client v3.19 (with batch poll & execute using completableFuture) Its very evident. This hangs the entire system if your next workflow trigger is waiting for this workflow to be completed.

This also doesn't mark the workflow as completed with pause and resume because one of the task instance is still in-progress.

ab48917 commented 4 months ago

Have a work around to handle this.