StackStorm / orquesta

Orquesta is a graph based workflow engine for StackStorm. Questions? https://github.com/StackStorm/st2/discussions
https://docs.stackstorm.com/orquesta/
Apache License 2.0
98 stars 39 forks source link

Workflow join is not properly working if one step fails #225

Closed wingiti closed 3 years ago

wingiti commented 3 years ago

If I use the following code the workflow is running sometimes successful and sometimes not, I don't know the reason for failing or succeeding. Might be the execution times and which steps finish first.

What I want to achieve:

For this test case i configured "MAC_vendor_lookup" to fail. As 0 is no valid MAC.

It is also not working if i use join: 3

I am currently using Stackstorm 3.2.0 (default stackstorm vagrant image Ubuntu 16.04)

version: 1.0

description: Test for GIT BUG

input:
  - ip: "127.0.0.1"
  - mac_address: "0"
  - dns_hostname: "test.testad.net"

vars:
  - mac_vendor: null
  - a_some_info: "unkown"
  - reverse_lookup_result: null
  - forward_lookup_result: null

tasks:
  Reverse_lookup:
    action: core.local
    input:
      cmd: "nslookup <% ctx().ip %>"
    next:
      - when: <% completed() %>
        publish:
          - reverse_lookup_result: <% result().stdout %>
        do:
          - Add_host_info_note

  Forward_lookup:
    action: core.local
    input:
      cmd: "dig <% ctx().dns_hostname %>"
    next:
      - when: <% completed() %>
        publish:
          - forward_lookup_result: <% result().stdout %>
        do:
          - Add_host_info_note

  MAC_vendor_lookup:
    action: core.http
    input:
      url: "https://api.macvendors.com/<% ctx().mac_address %>"
      method: "GET"
    next:
      - when: <% succeeded() %>
        publish:
          - mac_vendor: <% result().body %>
        do:
          - Add_host_info_note
      - when: <% failed() %>
        publish:
          - mac_vendor: "-"
        do:
          - Add_host_info_note

  Do_something_else:
    action: core.echo message="Test"
    next:
      - when: <% completed() %>
        publish:
          - a_some_info: <% result().stdout %>
        do:
          - Add_more_info

  Add_host_info_note:
    join: all
    action: core.echo message="<% ctx().reverse_lookup_result %> <% ctx().forward_lookup_result %> <% ctx().mac_vendor %>"

  Add_more_info:
    action: core.echo message=<% ctx().a_some_info %>

Error in case of failure:

{
      "message": "UnreachableJoinError: The join task|route \"Add_host_info_note|0\" is partially satisfied but unreachable.",
      "type": "error",
      "route": 0,
      "task_id": "Add_host_info_note"
    }

Sometimes it works, sometimes not. Without changing anything inbetween: grafik

Might be related to #190

amanda11 commented 3 years ago

I have reproduced the same problem on a single node 3.3.0 version of StackStorm running on CentOS 8 which is configured with redis.

If I use a workflow such as:


version: 1.0
tasks:
  # [536, 116]
  task1:
    action: core.echo
    next:
      - do:
          - task2
      - do:
          - task3
    input:
      message: start
  # [391, 251]
  task2:
    action: core.local
    next:
      - do:
          - task4
    input:
      cmd: ls
  # [765, 275]
  task3:
    action: core.local
    next:
      - do:
          - task4
        when: <% failed() %>
    input:
      cmd: ls /rubbish
  # [550, 436]
  task4:
    action: core.echo
    join: all
    input:
      message: joined

then it can fail on the join with the same error about partially satisified. Sometimes it passes, sometimes it fails.

However if I alter the workflow to add in a dummy task before the join then it works (which is the same workaround performed in #190), e.g. by adding a task5 after the task that fails but on that same branch, then the join works.

version: 1.0
tasks:
  # [536, 116]
  task1:
    action: core.echo
    next:
      - do:
          - task2
      - do:
          - task3
    input:
      message: start
  # [391, 251]
  task2:
    action: core.local
    next:
      - do:
          - task4
    input:
      cmd: ls
  # [765, 275]
  task3:
    action: core.local
    next:
      - do:
          - task5
        when: <% failed() %>
    input:
      cmd: ls /rubbish
  # [550, 436]
  task4:
    action: core.echo
    join: all
    input:
      message: joined
  # [773, 371]
  task5:
    action: core.echo
    next:
      - do:
          - task4
    input:
      message: hello
wingiti commented 3 years ago

I was not able to implement the workaround successful. I changed my workflow like following, to implement an intermediate task called "Workaround"


description: Test for GIT BUG

input:
  - ip: "127.0.0.1"
  - mac_address: "0"
  - dns_hostname: "test.testad.net"

vars:
  - mac_vendor: "DEFAULT-VALUE"
  - a_some_info: "unkown"
  - reverse_lookup_result: null
  - forward_lookup_result: null

tasks:
  Reverse_lookup:
    action: core.local
    input:
      cmd: "nslookup <% ctx().ip %>"
    next:
      - when: <% completed() %>
        publish:
          - reverse_lookup_result: <% result().stdout %>
        do:
          - Add_host_info_note

  Forward_lookup:
    action: core.local
    input:
      cmd: "dig <% ctx().dns_hostname %>"
    next:
      - when: <% completed() %>
        publish:
          - forward_lookup_result: <% result().stdout %>
        do:
          - Add_host_info_note

  MAC_vendor_lookup:
    action: core.http
    input:
      url: "https://api.macvendors.com/<% ctx().mac_address %>"
      method: "GET"
    next:
      - when: <% succeeded() %>
        publish:
          - mac_vendor: <% result().body %>
        do:
          - Workaround
      - when: <% failed() %>
        publish:
          - mac_vendor: "-"
        do:
          - Workaround

  Do_something_else:
    action: core.echo message="Test"
    next:
      - when: <% completed() %>
        publish:
          - a_some_info: <% result().stdout %>
        do:
          - Add_more_info

  Workaround:
    action: core.noop
    next:
      - when: <% succeeded() %>
        do:
          - Add_host_info_note

  Add_host_info_note:
    join: all
    action: core.echo message="<% ctx().reverse_lookup_result %> <% ctx().forward_lookup_result %> <% ctx().mac_vendor %>

  Add_more_info:
    action: core.echo message=<% ctx().a_some_info %>

If I execute it, I get the following error:

{
      "spec_path": "tasks.Add_host_info_note",
      "message": "The join task \"Add_host_info_note\" is unreachable. A join task is determined to be unreachable if there are nested forks from multi-referenced tasks that join on the said task. This is ambiguous to the workflow engine because it does not know at which level should the join occurs.",
      "type": "semantic",
      "schema_path": "properties.tasks.patternProperties.^\\w+$"
    }

Looks like the difference in regards to your workflow is that the "do" section in your wfl is not part of the when block. If I change my wfl like this:

MAC_vendor_lookup:
    action: core.http
    input:
      url: "https://api.macvendors.com/<% ctx().mac_address %>"
      method: "GET"
    next:
      - do:
          - Workaround
      - when: <% succeeded() %>
        publish:
          - mac_vendor: <% result().body %>
      - when: <% failed() %>
        publish:
          - mac_vendor: "-"

The workflow is now executed successful from a flow perspective but the value published in this task for mac_vendor is not visible/usable in the final step "Add_host_info_note". It has always the value given in "vars" section.

I can not confirm any successful workaround so far.

amanda11 commented 3 years ago

With my simplified way but with the publish I got the conditional publishing working with some Jinja (you might be able to do it in YAQL). But I simplified it down to getting the equivaletn of your MAC_vendor_lookup have one transition to workaround but the conditional being in the publish value.

e..g I had:

   next:
      - do:
          - task5
        when: <% completed() %>
        publish:
          - mac_vendor: "{% if succeeded() %} {{ result().stdout }} {% else %} - {% endif %} "

So you might be able to use something like:


MAC_vendor_lookup:
    action: core.http
    input:
      url: "https://api.macvendors.com/<% ctx().mac_address %>"
      method: "GET"
   next:
      - do:
          - Workaround
        when: <% completed() %>
        publish:
          - mac_vendor: "{% if succeeded() %} {{ result().body }} {% else %} - {% endif %} "
m4dcoder commented 3 years ago

I am able to reproduce the issue. Thanks for reporting this. StackStorm v3.4 is getting set to be released. Should expect fix in v3.5.