Netflix / conductor

Conductor is a microservices orchestration engine.
Apache License 2.0
12.82k stars 2.34k forks source link

WAIT task is not working properly (after v3.13.0) #3402

Open anaken opened 1 year ago

anaken commented 1 year ago

Bug description If you specify duration more than 30 seconds, then WAIT task will never be completed (it will be in IN_PROGRESS state). Important: This bug has appeared in v3.13.0.

Workflow To Reproduce

{
  "name": "testWaitWf",
  "description": "testWaitWf",
  "version": 1,
  "tasks": [
    {
      "name": "justWaitTask",
      "taskReferenceName": "justWaitTask",
      "type": "WAIT",
      "inputParameters": {
        "duration": "45 seconds"
      }
    }
  ],
  "inputParameters": [],
  "outputParameters": {
    "result": {
      "parentInput": "${workflow.input}"
    }
  },
  "schemaVersion": 2,
  "restartable": true,
  "workflowStatusListenerEnabled": false,
  "ownerEmail": "example@email.com",
  "timeoutPolicy": "ALERT_ONLY"
}

Expected behavior Task should complete after duration period (go to COMPLETED state).

v1r3n commented 1 year ago

Hi @anaken will take a look and send to fix.

DukeDai commented 1 year ago

Could be cl: 7c1200334 caused the issue if it's from v3.13.0 but not v3.12.0? I hit issue with WAIT in DO_WHILE task, not executed according to 'duration'. ` if (taskModel.getTaskType() == TaskType.TASK_TYPE_WAIT || taskModel.getTaskType() == TaskType.TASK_TYPE_HUMAN) {

// getWaitTimeout() = System.currentTimeMillis() + (timeDuration.getSeconds() * 1000), it's not offset of seconds but absolute time. postponeDurationSeconds = (taskModel.getWaitTimeout() != 0) ? taskModel.getWaitTimeout() + 1 : properties.getWorkflowOffsetTimeout().getSeconds(); } else { postponeDurationSeconds = (taskModel.getResponseTimeoutSeconds() != 0) ? taskModel.getResponseTimeoutSeconds() + 1 : properties.getWorkflowOffsetTimeout().getSeconds(); } `

If making WAIT as async task and execute() abide by the logic of 'duration' or 'until', WAIT tasks can be executed in DO_WHILE, but I'm not sure whether it's triggered more than the expected times.

yohanyflores commented 1 year ago

Hi @anaken! I have the same problem. I made a PR fixing the error.

narayanapadmanabhuni commented 1 year ago

WAIT Task with duration more than 30 seconds is not working verified in 3.13.5 as well. Is it working for Any one? We are using persistence as Postgres

narayanapadmanabhuni commented 1 year ago

if you are using the conductor-postgres-persistence as persistence me too facing the same issue in 3.13.5. The sweeper will not picks the WAIT task which is already popped from queue_message table and left in infinite state until it is manually approved. if you set the duration of the WAIT task as less than 30 seconds, then the WAIT task will be completed and will be moved to the next stage. I have made an temporary fix for it from my end, so that we would not be blocked.

lijia-rengage commented 1 year ago

if you are using the conductor-postgres-persistence as persistence me too facing the same issue in 3.13.5. The sweeper will not picks the WAIT task which is already popped from queue_message table and left in infinite state until it is manually approved. if you set the duration of the WAIT task as less than 30 seconds, then the WAIT task will be completed and will be moved to the next stage. I have made an temporary fix for it from my end, so that we would not be blocked.

Hi I have the same question using 3.13.5. Can you share your fix with me? thx a lot!