Open guzzijones opened 1 year ago
The use of 'latest' is causing my tox unit testing to fail it appears.
orquesta.exceptions.PluginFactoryError: Unable to load plugin orquesta.composers.native. No 'orquesta.composers' driver found, looking for 'native'
I also had to disable python 3.6 for now.
@m4dcoder can you please take a look at this python 3.8 test failing. It looks like stevedore is not able to find the native, mock plugins to run the tests in python 3.8.
I did some testing with st2 and this definitely breaks with-items no longer report failure to the parent workflow.
(virtualenv) [vagrant@stackstorm]~/st2/tools$ st2 execution get 64adc15887cff69baae67294
id: 64adc15887cff69baae67294
action.ref: test_pack.test_fail_workflow
parameters:
number: 5
status: running (635s elapsed)
start_timestamp: Tue, 11 Jul 2023 20:53:44 UTC
end_timestamp:
log:
- status: requested
timestamp: '2023-07-11T20:53:44.168000Z'
- status: scheduled
timestamp: '2023-07-11T20:53:44.245000Z'
- status: running
timestamp: '2023-07-11T20:53:44.317000Z'
result:
output: null
+--------------------------+------------------------+-----------+---------------------+-------------------------------+
| id | status | task | action | start_timestamp |
+--------------------------+------------------------+-----------+---------------------+-------------------------------+
| 64adc1581982b0a045f91e70 | succeeded (0s elapsed) | do_action | test_pack.test_fail | Tue, 11 Jul 2023 20:53:44 UTC |
| 64adc1591982b0a045f91e85 | succeeded (0s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e88 | succeeded (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e8b | failed (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e8f | succeeded (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e92 | succeeded (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
+--------------------------+------------------------+-----------+---------------------+-------------------------------+
staged tasks need to be copied or with-items tasks will remain stuck if one sub action fails
I was actually able to get the json_8mb.json through this workflow with this change along with the changes from the zstandard compression over in the st2 PR queue
st2 run test_pack.test_workflow @json=/home/vagrant/json_8mb.json
from st2common.runners.base_action import Action
class Process(Action):
def run(self, *args, **kwargs):
return kwargs["json"]
version: 1.0
tasks:
do_action:
action: test_pack.test_action
input:
json: <% ctx().json %>
next:
- when: <%succeeded()%>
publish:
- bar: [1,2,3]
do:
- with_test
with_test:
with:
items: <% ctx().bar %>
concurrency: 1
action: test_pack.test_action
input:
json: <% ctx().json %>
next:
- when: <% succeeded() %>
publish:
- with_output: <% result().result %>
input:
- json
output:
- with_output: <% ctx().with_output %>
+--------------------------+-------------------------+-----------+-----------------------+--------------------+
| id | status | task | action | start_timestamp |
+--------------------------+-------------------------+-----------+-----------------------+--------------------+
| 64af43382c572b699fca6ea5 | succeeded (68s elapsed) | do_action | test_pack.test_action | Thu, 13 Jul 2023 |
| | | | | 00:20:06 UTC |
| 64af43d42c572b699fca6eb6 | succeeded (74s elapsed) | with_test | test_pack.test_action | Thu, 13 Jul 2023 |
| | | | | 00:22:43 UTC |
| 64af44722c572b699fca6ec7 | succeeded (79s elapsed) | with_test | test_pack.test_action | Thu, 13 Jul 2023 |
| | | | | 00:25:20 UTC |
| 64af45122c572b699fca6ed8 | succeeded (68s elapsed) | with_test | test_pack.test_action | Thu, 13 Jul 2023 |
| | | | | 00:28:01 UTC |
+--------------------------+-------------------------+-----------+-----------------------+--------------------+
I was able to get the st2 unit tests to also pass. I had to leave in the deep copy for the errors. I am testing this locally on our dev box.
Do we have any numbers on how much things are speed up by this change? Ideally, some micro + end to end benchmarks would be great (similar to what I've done for the DB layer changes).
When I was originally working on the database layer performance improvements and I implemented fast_deepcopy()
, I also had a quick look if we could utilize it in Orquesta, but I found some blockers and Orquesta wasn't my priority back then so I didn't spend more time digging in.
IIRC, it was something related to complex types used by the state stuff since fast_deepcopy()
only supports simple types aka no object instances.
If we can get those sorted out without any issues and the numbers indeed indicate performance improvements, then it would be a great change :)
On a related note to my comment above^ - I think it would be good to add a feature flag / config option for this functionality.
This will serve multiple purposes:
1) We will be able to use feature flag to benchmark both scenarios - regular deepcopy vs fast_deepcopy
2) In case user encounters issues with fast_deepcopy()
which we didn't catch here, they can easily opt-out of this functionality and continue using StackStorm without needing to downgrade or similar.
We won't be able to add a feature flag without a change in ST2. Orquesta does not have the Stackstorm config object afaik. I will put some benchmarks in the PR and also make sure we double check that errors is also copied.
I will put some benchmarks in the PR and also make sure we double check that errors is also copied.
Thanks.
We won't be able to add a feature flag without a change in ST2. Orquesta does not have the Stackstorm config object afaik.
Yeah, I assume it would probably also require a change on the StackStorm side. If you have cycles, I still think that would be a good idea.
Still much easier to flip the feature flag / config option in case something goes wrong versus needing to downgrade to a previous release.
Especially if it's combined with a breaking backward incompatible change in StackStorm core (live action FK change - there is only a forward migration in that PR, but not a backward one).
I've been looking into some slowness for workflows with a large amount of items inside of a with items step, I saw this PR and I'm curious what items are we actually looking to have gated by a feature flag in this PR?
task render speed gain when removing deep copy for large parameters
I added a micro benchmark to show how much faster it is to NOT do a deep copy. I could add it for the serialize and deserialize, but the results will be the same. Copying large objects is costly.
I am running this additional speed up on our internal fork of St2.
I will report back if I see any issues.
It looks like ci/cd failed because black changed with their latest release.
@Kami @guzzijones are we requiring the feature flag to switch between deepcopy and fastdeepcopy for this to move forward?
Note, this will help with "with items" speed issues.
@nzlosh @cognifloyd please take a peak at this. I adds significant speed improvements by eliminating some deep copies. This helps with larger context objects.