cncf / wg-serverless

CNCF Serverless WG
https://cncf.io
Apache License 2.0
1.51k stars 153 forks source link

Changing State to Task #243

Closed tsurdilo closed 3 years ago

tsurdilo commented 4 years ago

This PR changes the naming of our control flow logic blocks from "state" to "task".

wanghq commented 4 years ago

I think it would be nice if all interested parties can list all the options and opinions. The two discussions were closed but haven't reached to an agreement (there are some good points in the discussion). This is a big change so I hope more people can agree on the final decision.

tsurdilo commented 4 years ago

@wanghq Feel free to comment and give your opinions on this PR.

From the language perspective this is actually a pretty small change:

  1. "states" array -> "tasks" array
  2. "stateDataFilter" object-> "taskDataFilter" object
  3. "nextState" object -> "nextTask" object
manuelstein commented 4 years ago

I think we should contrast the two models first ("workflow of tasks" vs. "state chart") to understand the implications. Here is probably not a good place to write all of it down, but to be brief:

In a task model, a common way to describe order are dependencies, e.g. B depends on (the output of) A and likewise a task C could depend on (the output of) A, which allows B and C to start in parallel. It's common for a DAG to have a few indirections. Contrary to that, a state chart (unless it's a dynamic one like Petri nets), would imply the program is at one state at a time, which is why we have the parallel state to offer concurrent executions. The state chart is conveniently more deterministic, especially useful when maintaining a workflow state (workflow data) - otherwise when starting B and C, the task graph could allow B to modify the workflow data before C even starts (if B were to run short and C delayed), or vice versa, or both B and C start with the same state but make conflicting changes (last write wins?)

So, parallel executions are more complicated in state charts (we need a Parallel state for that), but when we want common workflow data (also to allow stateful workflows), then states give us certainty over when the workflow state changes and who is working with it.

In a build pipeline made of tasks, I could have two steps that write an artifact each and depending on who takes longer, last write typically wins. Usually, the pipeline designer has to actively name the output/artifact of the task that should be copied over or used further down the pipeline. In our states language, dataResultsPath is optional and the function/operation output would otherwise be merged with the workflow data. It's an advantage not to name each output, but more can go wrong here if we adopt the task model and the associated expectation that one can draw random task graphs (directed, acyclic graphs, yet still with a lot of concurrency). I think this discussion also shows where the pipeline of tasks stands - it's in between. It's more ordered than a random DAG through stages. In a pipeline stage, tasks can run concurrently, but they all need to synchronize (conclude/come to a stop) before entering the next stage, pretty much like a transition in a Petri net.

We may also want to discuss this in the Primer :)

tsurdilo commented 4 years ago

@manuelstein I think it's pretty clear now that we need to change the name of our unit of work. We have diverged from this initial thought of targeting both workflows and state machines. Our language fits the workflow model. We should also not strictly look at pipelines as we define an orchestration solution that lies ontop of the base pipeline model. I agree this needs to go into the primer, but at the same time I feel we need to vote on this asap because as it currently stands, this small change is setting us back.

To me calling an unit of work a "task" fits a model where we fit in very well. And it is also widely adopted term.

wanghq commented 4 years ago

@wanghq one more question, was wondering are you guys using/adopting the spec at your workplace?

So far we haven't adopted the spec because it's a bit different than what we have and this spec is still evolving. But I do find some features asked by our customers, e.g. the parallel waitOnCompletion ( https://github.com/cncf/wg-serverless/pull/229), and we may add this support by adopting the same option. Also I can confirm that it shouldn't be very hard to support this spec in our service and only a small portion needs to be implemented.

You can find our Flow Definition Language (FDL) schema here (This VSCode plugin is for managing workflow and FaaS functions) and here is an example of copying objects between buckets.

On Fri, May 22, 2020 at 7:44 PM Tihomir Surdilovic notifications@github.com wrote:

@wanghq https://github.com/wanghq one more question, was wondering are you guys using/adopting the spec at your workplace?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/wg-serverless/pull/243#issuecomment-632973964, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACORFEAVLO52AD6DGTZ233RS42CPANCNFSM4NICRZBA .

tsurdilo commented 4 years ago

@wanghq Thanks for the info. Looked at it briefly and wanted to ask a couple of questions please (I know this is not related to this pr specifically but whatever ;) )

1) what real process scenarios do you see with the Pass/Fail states. I saw the same with step functions and had hard time thinking of scenarios where this needed outside of like maybe testing. We define the inject state, which is similar to pass but specific to injecting static data. Fail state can be handled with end state definition that send off a "failure" cloud event for example.

2) from your example it seems that you guys might be converting this yaml bpmn? especially when i look at the inputMappings and outputMappings. Which approach do you see better, this explicit mapping between steps/tasks or the merge option which we propose.

Yeah from looking at this I think you could definitely define like an "early" adoption to this spec. Would be a great thing to promote it.

Thanks!

wanghq commented 4 years ago
  1. what real process scenarios do you see with the Pass/Fail states.

The Fail is like "throwing exception" in the code. It's a way to stop and return to the caller with some info (error/cause). The Succeed (I guess you meant "Succeed" instead of "Pass") just stops and returns to the caller.

Below is an example (pseudo code). How is this defined with the workflow spec?

func flow() {
  a = task1()

  // parallel
  parallel {
    branch {
      // choice
      if a > 1 {
        b = task2()
      } else {
        fail() // Fail here and do not run task4. Let parallel decide whether to swallow or propagate the error. 
      }
      c = task4()
    }
    branch {
      d = task5()
    }
  }
  e = task6()
}
tsurdilo commented 4 years ago

@wanghq (using "state" notation as decision on this pr is not made yet :))

hope this helps.

wanghq commented 4 years ago

@tsurdilo Would you mind writing a sample definition? Because the state/step could be deeply nested, I don't understand how a nested failure can be propagated to the parent without an explicit Fail state/step.

wanghq commented 4 years ago

2. from your example it seems that you guys might be converting this yaml bpmn? especially when i look at the inputMappings and outputMappings. Which approach do you see better, this explicit mapping between steps/tasks or the merge option which we propose.

Does workflow spec allow to transform the input in a granular way, e.g. building new JSON, renaming keys, not only selecting a portion of the input? Step Functions later introduced Parameters to address this issue. And we see the need to do so because the task (functions, APIs)'s input isn't flexible sometimes.

tsurdilo commented 4 years ago

@tsurdilo Would you mind writing a sample definition? Because the state/step could be deeply nested, I don't understand how a nested failure can be propagated to the parent without an explicit Fail state/step.

@wanghq Yes, I will do it tonight as a separate pr (add it as another example in spec examples doc).

tsurdilo commented 4 years ago

@wanghq regarding question about parameters, our functionRef definition has a parameters property which is of type object. You should use that to define parameters passed when invoking functions.

tsurdilo commented 4 years ago

@wanghq added pr https://github.com/cncf/wg-serverless/pull/244 It describes how exceptions that occur during parallel state branch executions can be handler either by the states inside branches, or propagate to the Parallel state to be handled there.

Your comment in the example "Let parallel decide whether to swallow or propagate the error. " is handled by this specification differently. The states have the option to handle the error first, and if they don't explicitly handle it it can be handled by the parallel state.

Note that exceptions that are not handled halt workflow execution. Hope this helps.

wanghq commented 4 years ago

@tsurdilo The PR looks good. But my point regarding your question "why Fail is needed" is that somehow users need a way to throw exception/error in the workflow definition. Seems your example shows exception threw by functions. In my above example, the error is threw if a <= 1, which is a use case of how Fail step is needed.

tsurdilo commented 4 years ago

@wanghq yes I understand but that to me is an anti-pattern. Why would you want to throw an artificial exception at this point to do the same as end branch execution.

a<=1 is a data-based branching decision..no an exception imo.

You could even end in an inject state and put in a static "fail" data which then can be evaluated after if you wanted. I understand what you are trying to do I think and this falls more of like a throwing catching error events in bpmn2 maybe, but I assume an actual exception there needs to happen as well? Your example where you just want to fail/throw exception I think can be handled easier with just an end state inside branch.