StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.08k stars 747 forks source link

re-run a failed action didn't inherit parent execution id #3568

Open leoxhj opened 7 years ago

leoxhj commented 7 years ago

Dear support,

If we re-run failed actions in workflow, the new separated action execution didn't inherit parent execution id, means we can't find "parent" element in the new action execution which is a lot trouble for us, since we want the new one show up in the same workflow chain. if we use api: /api/v1/executions/{id}/children the new one is not able to get since it doesn't has parent.

so, my question is:

  1. Is it possible to re-run current action instead create a separated one, in some scenario like, external system fix problem, and then we need re-run the failed one to get it through.
  2. Is it possible to post an new API to get children by trace_context id? (seems the new actions hold the same trace id, so we can get all actions relate to trace id)

    "context": { "tracecontext": { "id": "5966f6ff570e014c3431c471" }, "re-run": { "ref": "5967015f570e014771c04345" }, "user": "st2admin" }

lakshmi-kannan commented 7 years ago

@leoxhj Was this with mistral or action chain? Are you using the --tasks option documented here? https://docs.stackstorm.com/mistral.html#rerunning-workflow-execution

I am going to repro myself but would be good if you could try --tasks if you haven't already.

leoxhj commented 7 years ago

@lakshmi-kannan we use mistral, thanks, I will look into this --tasks options.

leoxhj commented 7 years ago

from st2 webui, I tried to re-run a failed mistral workflow with failed tasks, but according doc, only error states can be re-run, from my understanding, all failed tasks or workflow end-up with failed state, so how can we re-run?

[root@UAT9696 configs]# st2 execution re-run 59897e1d570e013194c8a76f --tasks t_server_decommmission .. id: 598a7db4570e010eda9d13fb action.ref: liberty.pool-decommission parameters: assigned_group: "上海-OPS-应用配置" assignee: hj_xu pool_id: '1024' submitter: hj_xu status: failed error: Only tasks in error state can be rerun. Unable to identify rerunable tasks: t_server_decommmission. Please make sure that the task name is correct and the task is in rerunable state. traceback: File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2actions/container/base.py", line 100, in _do_run (status, result, context) = runner.run(action_params) File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 49, in wrapped_f return Retrying(*dargs, dkw).call(f, *args, *kw) File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 206, in call return attempt.get(self._wrap_exception) File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 200, in call attempt = Attempt(fn(args, kwargs), attempt_number, False) File "/opt/stackstorm/runners/mistral_v2/mistral_v2.py", line 219, in run result = self.resume(ex_ref=self.rerun_ex_ref, task_specs=task_specs) File "/opt/stackstorm/runners/mistral_v2/mistral_v2.py", line 333, in resume 'and the task is in rerunable state.' % ', '.join(missing_tasks))

start_timestamp: 2017-08-09T03:12:52.827334Z end_timestamp: 2017-08-09T03:12:56.950958Z result: See error and traceback.