StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.08k stars 747 forks source link

Undefined variables cause mistral workflows to hang (st2 2.2.1) #3723

Closed mickmcgrath13 closed 4 years ago

mickmcgrath13 commented 7 years ago

This works:

---
version: '2.0'

test.undefined-variable-hangs:
  tasks:
    delegate:
      # [235, 26]
      action: core.noop
      publish:
        world: "world"
      on-success:
        - test

    test:
      # [235, 230]
      action: core.local
      input:
        cmd: "echo hello {{ _.world }}"

..but this hangs

---
version: '2.0'

test.undefined-variable-hangs:
  tasks:
    delegate:
      # [235, 26]
      action: core.noop
      on-success:
        - test

    test:
      # [235, 230]
      action: core.local
      input:
        cmd: "echo hello {{ _.world }}"

It seems, too, like the multiple tasks is necessary because I just got an error (undefined variable) with just one task, and the workflow execution failed.

The output of st2 execution get --detail 59b16e66271fb2452d82cda7 wasn't really helpful because it is just in the RUNNING state

{
    "action": {
        "ref": "test.undefined-variable-hangs"
    }, 
    "context": {
        "user": "mick"
    }, 
    "id": "59b16e66271fb2452d82cda7", 
    "liveaction": {
        "action": "test.undefined-variable-hangs", 
        "action_is_workflow": true, 
        "callback": {}, 
        "id": "59b16e66271fb2452d82cda6", 
        "parameters": {}, 
        "runner_info": {
            "hostname": "supersecrethost", 
            "pid": 8746
        }
    }, 
    "result": {
        "extra": {
            "state": "RUNNING", 
            "state_info": null
        }, 
        "tasks": [
            {
                "created_at": "2017-09-07 16:05:58", 
                "id": "b0e093d1-247f-4ef7-93e5-be08a1a1615c", 
                "input": null, 
                "name": "delegate", 
                "published": {}, 
                "result": [], 
                "state": "RUNNING", 
                "state_info": null, 
                "updated_at": null, 
                "workflow_execution_id": "dffb8df0-c80f-4b3d-8bd8-c86bde703ce4", 
                "workflow_name": "test.undefined-variable-hangs"
            }
        ]
    }, 
    "start_timestamp": "2017-09-07T16:05:58.052779Z", 
    "status": "running"
}

unless I cancel it - in which case it's in the CANCELED state

{
    "action": {
        "ref": "test.undefined-variable-hangs"
    }, 
    "context": {
        "user": "mick"
    }, 
    "end_timestamp": "2017-09-07T16:08:12.829600Z", 
    "id": "59b16e66271fb2452d82cda7", 
    "liveaction": {
        "action": "test.undefined-variable-hangs", 
        "action_is_workflow": true, 
        "callback": {}, 
        "id": "59b16e66271fb2452d82cda6", 
        "parameters": {}, 
        "runner_info": {
            "hostname": "supersecrethost", 
            "pid": 8746
        }
    }, 
    "result": {
        "extra": {
            "state": "CANCELLED", 
            "state_info": null
        }, 
        "result": null, 
        "tasks": [
            {
                "created_at": "2017-09-07 16:05:58", 
                "id": "b0e093d1-247f-4ef7-93e5-be08a1a1615c", 
                "input": null, 
                "name": "delegate", 
                "published": {}, 
                "result": [], 
                "state": "RUNNING", 
                "state_info": null, 
                "updated_at": null, 
                "workflow_execution_id": "dffb8df0-c80f-4b3d-8bd8-c86bde703ce4", 
                "workflow_name": "test.undefined-variable-hangs"
            }
        ]
    }, 
    "start_timestamp": "2017-09-07T16:05:58.052779Z", 
    "status": "canceled"
}

st2api

nothing useful

st2resultstracker

no results for test.undefined-variable-hangs

mistral-server.log

/var/log/mistral $ sudo tail mistral-server.log

2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server     return evaluator.evaluate(expression, context)
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server   File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/jinja_expression.py", line 143, in evaluate
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server     result = cls._env.from_string(expression).render(**ctx)
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server   File "/opt/stackstorm/mistral/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server     return self.environment.handle_exception(exc_info, True)
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server   File "/opt/stackstorm/mistral/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server     reraise(exc_type, exc_value, tb)
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server   File "<template>", line 1, in top-level template code
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server UndefinedError: 'mistral.workflow.data_flow.ContextView object' has no attribute 'world'
2017-09-07 09:27:14.792 9514 ERROR oslo_messaging.rpc.server 

st2actionrunner

/var/log/st2 $ sudo tail st2actionrunner.8533.log

    client.action_executions.update(action_execution_id, **data)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/mistralclient/api/v2/action_executions.py", line 64, in update
    return self._update('/action_executions/%s' % id, data)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/mistralclient/api/base.py", line 106, in _update
    self._raise_api_exception(resp)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/mistralclient/api/base.py", line 143, in _raise_api_exception
    error_message=error_data)
APIException: RemoteError: Remote error: UndefinedError 'mistral.workflow.data_flow.ContextView object' has no attribute 'world'
[u'Traceback (most recent call last):\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming\n    res = self.dispatcher.dispatch(message)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch\n    return self._do_dispatch(endpoint, method, ctxt, args)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch\n    result = func(ctxt, **new_args)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/engine_server.py", line 154, in on_action_complete\n    return self.engine.on_action_complete(action_ex_id, result, wf_action)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/db/utils.py", line 72, in decorate\n    return _with_auth_context(auth_ctx, func, *args, **kw)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 417, in func\n    return evt.wait()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait\n    return hubs.get_hub().switch()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch\n    return self.greenlet.switch()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 137, in _run_loop\n    result = func(*self.args, **self.kw)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 388, in _func\n    result = f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/db/utils.py", line 43, in _with_auth_context\n    return func(*args, **kw)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/action_queue.py", line 70, in decorate\n    res = func(*args, **kw)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/default_engine.py", line 114, in on_action_complete\n    action_handler.on_action_complete(action_ex, result)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/action_handler.py", line 54, in on_action_complete\n    task_handler.schedule_on_action_complete(action_ex)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 360, in schedule_on_action_complete\n    _on_action_complete(action_ex)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 105, in _on_action_complete\n    task.on_action_complete(action_ex)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/tasks.py", line 264, in on_action_complete\n    self.complete(state, state_info)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/tasks.py", line 191, in complete\n    dispatcher.dispatch_workflow_commands(self.wf_ex, cmds)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/dispatcher.py", line 94, in dispatch_workflow_commands\n    task_handler.run_task(cmd)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 58, in run_task\n    task.run()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/tasks.py", line 269, in run\n    self._run_new()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/tasks.py", line 293, in _run_new\n    self._schedule_actions()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/tasks.py", line 341, in _schedule_actions\n    input_dict = self._get_action_input()\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/engine/tasks.py", line 380, in _get_action_input\n    ctx_view\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 96, in evaluate_recursively\n    data[key] = _evaluate_item(data[key], context)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 85, in _evaluate_item\n    return evaluate_recursively(item, context)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 96, in evaluate_recursively\n    data[key] = _evaluate_item(data[key], context)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 79, in _evaluate_item\n    return evaluate(item, context)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 71, in evaluate\n    return evaluator.evaluate(expression, context)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/mistral/expressions/jinja_expression.py", line 143, in evaluate\n    result = cls._env.from_string(expression).render(**ctx)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render\n    return self.environment.handle_exception(exc_info, True)\n', u'  File "/opt/stackstorm/mistral/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception\n    reraise(exc_type, exc_value, tb)\n', u'  File "<template>", line 1, in top-level template code\n', u"UndefinedError: 'mistral.workflow.data_flow.ContextView object' has no attribute 'world'\n"].
2017-09-07 09:27:14,824 139914432543312 AUDIT base [-] Liveaction completed (liveaction_db={'status': 'succeeded', 'runner_info': {u'hostname': u'supersecrethost', u'pid': 8533}, 'parameters': {}, 'action_is_workflow': False, 'start_timestamp': '2017-09-07 16:27:14.460151+00:00', 'callback': {u'url': u'http://0.0.0.0:8989/v2/action_executions/b67c7004-7cc2-4a37-a67f-235ee2898c43', u'source': u'mistral'}, 'notify': None, 'result': {'failed': False, 'return_code': 0, 'succeeded': True}, 'context': {u'user': u'mick', u'parent': {u'user': u'mick', u'execution_id': u'59b17362271fb2452d82cddc'}, u'mistral': {u'action_execution_id': u'b67c7004-7cc2-4a37-a67f-235ee2898c43', u'task_id': u'b9012e5b-3e4f-40fc-861e-922b608c535f', u'workflow_name': u'test.undefined-variable-hangs', u'workflow_execution_id': u'fac30be7-e6d2-45dd-bf44-3c8ac9a478fb', u'task_tags': None, u'task_name': u'delegate', u'callback_url': u'/v2/action_executions/b67c7004-7cc2-4a37-a67f-235ee2898c43'}}, 'action': u'core.noop', 'id': '59b17362271fb2452d82cdde', 'end_timestamp': '2017-09-07 16:27:14.698116+00:00'})

Expected results

The workflow should fail

nmaludy commented 7 years ago

Confirmed this is a problem in st2 v2.4.

Also, trying to cancel that workflow using st2 execution cancel xxx causes the workflow to be stuck in the canceling state even though the underlying mistral workflow enters into the CANCELED state.

| 366d3bed-d171-43fc-bc47-8166629f44eb | b9c68137-79c2-465c-8d00-a47f666a1ce5 | examples.nick_workflow                      |                        | <none>                               | CANCELLED | None                         | 2017-09-07 19:34:29 | 2017-09-07 19:38:40 |
| + 59b19f45a814c05044731ad6 | examples.nick_w | st2admin     | canceling               | Thu, 07 Sep     |               |
|                            | orkflow         |              |                         | 2017 19:34:29   |               |
|                            |                 |              |                         | UTC             |               |