Closed emptywee closed 4 years ago
Adding wait-before to a task causes it re-init previously published variables
I'm unable to reproduce this.
You cancel the parent one, it doesn't affect children
To be clear, the parent workflow in this examples calls another workflow action in StackStorm. The parent workflow is not from a mistral workbook calling other sub workflows defined within the workbook.
When a mistral workflow is cancelled in StackStorm and if there's a task that is still running, the workflow will be marked cancelled but the task will not be affected. StackStorm and Mistral will let the task(s) reach completion gracefully. If the task is another workflow action, you can request cancellation on the task and that will lead to the subworkflow being cancelled and bubbled up to the parent.
In my example the parent task times out, but the child one keeps running, and running, and running, and running, over and over.
The behavior of timeout is similar to a cancel. Currently if a task times out, the parent workflow will fail but the task is allowed to complete gracefully. This is by design as Mistral do not know what is being run in the task and it may cause unintended consequences if forcefully cancelled or terminated. This is left to the user to manage. As for why the child task keeps running and running and over and over. This probably has to do w/ your child task which is in a loop. This is probably why there is a continuous chatops notification.
So in summary, this does not look like a bug to me. If you feel otherwise, please reopen the issue and present more details.
Could you show what is being printed by the core.local
echo command if you keep wait-before
parameter in the task? I can post my results of running those workflows.
If this is not a bug, I can't even think what is a bug and how it can be handled by end user.
As to handling cancellation of parent tasks, could you give a short example how this can be handled by users? Let's say I want to gracefully stop my mistral workflow if parent workflow is cancelled, provided that I don't know how many levels the workflow is nested. I mean, the workflow can be started by another workflow, which in its turn was started by another mistral workflow.
How this is supposed to be handled? I'd prefer to have a cascading cancellation request sent down the all running tasks started by the cancelled workflow, including all children, and to the very bottom. At least to have that option, even if it's not gonna be on by default.
I am very surprised (to say the least) that none of these flaws were considered worth looking at and/or fixing :-(
@emptywee regarding the publish and wait-before issue you're reporting...
sandbox.pub_wait:
type: direct
output:
vars:
- <% $.var1 %>
- <% $.var2 %>
tasks:
task1:
action: core.noop
publish:
var1: 'abc'
on-success:
- task2
task2:
wait-before: 3
action: core.local
input:
cmd: 'echo "<% $.var1 %>"'
publish:
var2: <% task(task2).result.stdout %>
var1 published after task1 is being echo'd correctly in task2 after wait-before. Please look at the list of "vars" in the result for confirmation.
"action": {
"ref": "sandbox.pub_wait"
},
"context": {
"user": "stanley"
},
"end_timestamp": "2017-03-27T20:49:49.868060Z",
"id": "58d97ae41e2e242f04192b05",
"liveaction": {
"action": "sandbox.pub_wait",
"action_is_workflow": true,
"callback": {},
"id": "58d97ae41e2e242f04192b04",
"parameters": {},
"runner_info": {
"hostname": "arkham",
"pid": 11901
}
},
"result": {
"extra": {
"state": "SUCCESS",
"state_info": null
},
"tasks": [
{
"created_at": "2017-03-27 20:49:40",
"id": "16784a1e-4614-4917-aa9f-c659df132568",
"input": null,
"name": "task1",
"published": {
"var1": "abc"
},
"result": {
"failed": false,
"return_code": 0,
"succeeded": true
},
"state": "SUCCESS",
"state_info": null,
"updated_at": "2017-03-27 20:49:41",
"workflow_execution_id": "6a251d4e-5f7f-40eb-82a2-f11b28228a34",
"workflow_name": "sandbox.pub_wait"
},
{
"created_at": "2017-03-27 20:49:41",
"id": "f1160cd0-497f-42bb-b1bd-ba7caf83c98f",
"input": null,
"name": "task2",
"published": {
"var2": "abc"
},
"result": {
"failed": false,
"return_code": 0,
"stderr": "",
"stdout": "abc",
"succeeded": true
},
"state": "SUCCESS",
"state_info": null,
"updated_at": "2017-03-27 20:49:44",
"workflow_execution_id": "6a251d4e-5f7f-40eb-82a2-f11b28228a34",
"workflow_name": "sandbox.pub_wait"
}
],
"vars": [
"abc",
"abc"
]
},
"start_timestamp": "2017-03-27T20:49:40.189381Z",
"status": "succeeded"
}```
@emptywee I think the problem you're having is this YAQL expression <% ($.get('attempt') or 0) + 1 %>
. It's always returning 0
.
@emptywee so for the workflow below, i did find the workflow to go into infinite loop. the published var is not incremented in this case as you've found. Not sure why but we'll track this as an issue. Thanks for your persistence.
version: '2.0'
sandbox.pub_wait:
type: direct
vars:
var1: 0
retries: 3
output:
vars:
- <% $.var1 %>
- <% $.var2 %>
tasks:
init:
action: core.noop
on-success:
- task1
task1:
action: core.noop
publish:
var1: <% $.var1 + 1 %>
on-success:
- task2
task2:
wait-before: 3
action: core.local
input:
cmd: 'echo "<% $.var1 %>"; exit 1'
publish:
var2: <% task(task2).result.stdout %>
on-error:
- task1: <% $.var1 <= $.retries %>
As for the cascading cancellation of children workflow execution, we'll revisit this and look for options. Next time, it'll be more helpful if these are tracked as separate issues.
I am very surprised (to say the least) that none of these flaws were considered worth looking at and/or fixing :-(
@emptywee If they're not worth looking, we will not be having this conversation. I'm very disappointed in this comment since we've been providing you w/ responsive and excellent support.
Now we are both upset :( Sorry about that, didn't mean it.
Still, I don't understand why you ask for sample workflows, but try to reproduce with a "slightly" different case.
Again, I apologize if it sounded wrong, offensive or in anyway inappropriate. I did not really mean anything, just got a bit upset.
Bug report registered in OpenStack @ https://bugs.launchpad.net/mistral/+bug/1681562
The source of the wait-before bug has been identified at https://bugs.launchpad.net/mistral/+bug/1681562. Please follow the link to review comments. We will need to wait for the rest of the Mistral core team to provide feedback on the use of the cache and how to workaround this issue.
Awesome. Let's hope for quick resolution. Thanks!
Improvement on 2nd issue w/ canceling subworkflows @ https://github.com/StackStorm/st2/pull/3375
Again, please separate issues in different post next time.
Hi,
Discovered two bugs while I was trying to reproduce only one :-)
Running latest st2 v2.2.0.
Something is off with cancellation and timed out tasks when nested tasks are mistral workflows. You cancel the parent one, it doesn't affect children. Also, sometimes cancelling them causes all sorts of unexpected results, like indefinite chatops notifications being triggered. In my example the parent task times out, but the child one keeps running, and running, and running, and running, over and over. If you remove
wait-before
parameter, then it's gonna finish after all retries are exhausted. Still, doesn't mean it's a valid workaround.Adding wait-before to a task causes it re-init previously published variables (at least it feels like so).
Providing simple workflows and alias to reproduce it (sorry for the names I gave to them):
wf_cancelation_issue.meta.yaml:
workflows/wf_cancelation_issue.yaml:
wf_cancelation_issue_inner.meta.yaml:
workflows/wf_cancelation_issue_inner.yaml:
aliases/wf_cancelation_issue.yaml