StackStorm / orquesta

Orquesta is a graph based workflow engine for StackStorm. Questions? https://github.com/StackStorm/st2/discussions
https://docs.stackstorm.com/orquesta/
Apache License 2.0
98 stars 39 forks source link

'Inspect the workflow spec' and 'Instantiate the workflow conductor' take too long #224

Closed liuzheng closed 3 years ago

liuzheng commented 3 years ago

Background:

In my company, we have some workflow with many subtask and many when-do judgements.

Now I'm work on upgrade stackstorm 2.0.0 to 3.2.0, and try to use orquesta to replace mistral-v2

Problem

In the st2common/services/workflows.py

https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/workflows.py#L221

# Inspect the workflow spec.
inspect(wf_spec, st2_ctx, raise_exception=True)

https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/workflows.py#L246

 # Serialize the conductor which initializes some internal values.
 data = conductor.serialize()

I don't know why one execution must inspect eatch running time?

And also the conductor.serialize() spent too much time.

m4dcoder commented 3 years ago

Workflow with many tasks and branches can lead to the delay during inspect. Workflow with many tasks and cycles can lead to delay in the conductor serialize. It is generally wise to logically break down a workflow into multiple subworkflows (or workflow in different StackStorm action that is called by the main workflow). You'll find performance benefits as well as testability.

liuzheng commented 3 years ago

@m4dcoder why doing this in every action? And why not generate a workable script or python script from the workflow yaml file, and actually run the script. BTW, I'm trying another way to do those workflow like jobs.

m4dcoder commented 3 years ago

@liuzheng The inspect is performed when the workflow is launched. We run this every launch because the workflow definition is defined in a file that can be modified in between runs. We can certainly look at feature to checksum and skip inspect if the definition hasn't changed since last run. As for the conductor.serialize, the current design saves and restores the state of the workflow execution on async checkpoint. This lets the action execution resume on any nodes in a scaled out deployment. We currently have a workflow engine that can already handle many different use cases. It is unnecessary to generate any python scripts for these workflows. You may be confusing us with how other workflow service/engine works. However your concerns with performance is valid and we appreciate your input.

liuzheng commented 3 years ago

Another case, why not save those yaml file into a database, to make sure it was verified, that will save the case in distributed scenario

m4dcoder commented 3 years ago

@liuzheng That's a possible solution. However this change will also cascade down to how we handle other StackStorm actions today since the current mode of operations is to have shell/python scripts for these actions on the filesystem and not registered in the database.

blag commented 3 years ago

This is an excellent topic for the next TSC meeting, happening on the first Tuesday of March.

@liuzheng Do you mind writing a proposal/issue for this in StackStorm/discussions and attending that meeting to discuss it?

liuzheng commented 3 years ago

I really appreciate the invitation, but now a days I have lots works to do. And I don't know how to start write this proposal. Please tell me your rule, and later I will try to write some.