Closed Kami closed 3 years ago
While quickly looking at it and trying to get tests to pass with orjson
I noticed some issues with unnecessary (and in case of large requests, slower) bytes -> unicode -> bytes conversions.
Likely some of those were needed when we still needed to support Python 2 and Python 3, but now that we only support Python 3, we should be able to get rid of some of those and working directly with bytes / unicode (depending on what specific code expects).
This change also exposed more places where we don't correctly encode / decode incoming and outgong data so the request / response contains string with b''
prefix.
On a related note - does anyone happen to have some executions with large result field laying around (maybe some from ci/cd server)? So I can use it in the micro benchmarks.
Awesome stuff, + 100! Python3 definitely should open us some new doors with the perf optimizations. Thanks @Kami for the research and new energy here :+1:
You can find a few construction blocks here as a starting example: https://github.com/StackStorm/st2/issues/4798. Someone did a good job preparing it for us.
@armab That issue is actually a different one - it's related to bad mongoengine performance when working with large datasets.
Using a different json library would likely still speed that action since operation involves parsing json, but it wouldn't help with the mongoengine related issue (aka storing result in the database).
For mongoengine related issue, work would need to continue on those issues - https://github.com/StackStorm/st2/pull/4838, https://github.com/StackStorm/st2/pull/4846.
Re mongoengine performance issues with large documents - one option we should also consider experimenting and benchmarking with is compression (would likely require some changes to store result as string and handle that transparently inside to_mongo and from_mongo functions).
Compression is not free (CPU cycles), but executions with large results usually contain textual data which compresses well so it's, possible that trading some CPU cycles for compression would still result in overall better throughput and shorter duration of execution related DB operations because we would be passing less bytes to mongoengine later.
We should start with benchmarking (perhaps using zstandard
algorithm) and then decide if it's worth pursuing those efforts.
@Kami I just meant there was an example in that issue that may help composing a workflow which would pass some larger data along.
Implemented in https://github.com/StackStorm/st2/pull/4846/
Right now StackStorm code utilizes native
json
library for json serialization / deserialization and alsoujson
forfast_deepcopy
function.Now that we don't support Python 2 anymore, we should evaluate using
orjson
- https://github.com/ijl/orjson. Based on micro benchmarks in another project it offers substantial improvements, even overujson
.In the past there were some issues with compatibility with native
json
library, but I believe most of them have been resolved.Having said that, we should still do research specifically for this project (update the code, verify everything works + micro benchmarks).
TODO
fast_deepcopy
)