Closed jdmeyer3 closed 3 years ago
Thanks for reporting this.
There we indeed multiple mongoengine issues related to writing large documents in the past.
Over the releases, the performance was improved in mongoengine itself and we've also done some performance optimizations, but it could be that there are more issues hiding (would need to dig in).
In the past, we also discussed an option of skipping mongoengine serialization / conversion layer all together and using pymongo directly in critical code paths. But this would require quite some code changes.
I know this issue has been opened quite a long time ago, but the problem hasn't gone away yet and I finally had a chance to work on #4846 and made good progress on it.
Initial micro benchmarks and end to end load tests are very promising - we see database write speed improvements up to 10x and up to 5x for reads when writing / reading execution results - all of that of course also translates to much faster action execution and workflow completion times for executions / workflows which operate on larger datasets.
Some more data and numbers is available here - https://github.com/StackStorm/st2/pull/4846#issuecomment-781629337, https://github.com/StackStorm/st2/pull/4846#issuecomment-782131035, https://github.com/StackStorm/st2/pull/4846#issuecomment-782837693.
I hope there will be no last minute surprises and we will be able to include those changes with v3.5.0 release.
Oh, and when I was testing the fix end to end with Python runner action which returns 4 MB of data (very similar to the example scenario you used), the whole action execution with the new code on my computer now takes ~1 second vs ~12 seconds with current master.
This includes the whole action execution flow end to end - from execution being scheduled, action ran by the Python runner and all the state persisted to the database (action execution, live action objects, etc.). Actual database writes itself for those two models are now in the range of ~200 ms range.
As per comments above, this should be much improved in 3.5.0.
I really think we can close this now with the https://github.com/StackStorm/st2/pull/4846 performance improvements by @Kami.
@jdmeyer3 Feel free to re-open it if you're still seeing an unacceptable performance with the st2 v3.5.0.
SUMMARY
When an action return ~4MB of data the actionrunner writing to MongoDB takes ~20 seconds for each document. During the writes, the CPU utilization spikes to 100%.
STACKSTORM VERSION
St2 v3.1.0
OS: CentOS 7.7.1908 Kernel 3.10.0-957.el7.x86_64 Kubernetes v1.14.1 Docker 19.03.2 Base Docker Image: CentOS 7.6.1810 (custom image)
Steps to reproduce the problem
In a Python 3 pack, create a python action with the following code
Expected Results
Low CPU utilization and the result returning within a relatively short time (<5 seconds)
Actual Results
CPU spiked to 100% for 20-30 seconds
adding some logs around
st2actions.container.base.py:296
withI'm getting logs like
Where it takes 10 seconds to update liveaction and another 5 seconds to update execution.
Additionally, I've done a cProfile against the actionrunner and I'm seeing the following
It looks like the CPU is spending most of its time building the mongoengine document object. May be related to https://github.com/MongoEngine/mongoengine/issues/1230