TkTech / pysimdjson

Python bindings for the simdjson project.
https://pysimdjson.tkte.ch
Other
643 stars 54 forks source link

Fairly high overhead on the boundary Python/C++ #73

Closed ateska closed 3 years ago

ateska commented 3 years ago

We are parsing a very high number of ~2KB JSON files in our Python-based application.

I also conducted a rather artificial test of "how many parser cycles" can I get with basically empty JSON ({}). The issue here is quite visible, the overhead of the Python<->pysymdjson boundary crossing is high relatively to other possible implementations.

A "parser cycle" is defined as a one call to parser.parse(json) on the existing parser instance.

I'm not 100% sure if this is a priority of this library, so feel free to close this one as irrelevant.

TkTech commented 3 years ago
* The Cython-based PoC implementation (in-house, so far) delivers ~700k parser cycles per second (very close to C++ implementation).

I am...skeptical. This binding is naive - there's definitely room for improvement. That said, there was a cython version, and the improvement was negligible.

The cost of creating a single python object tends to be higher than the entire document parse. So if you're saying you're getting parity...

Happy to incorporate any improvements, but the general goal is to improve performance by avoiding working in Python land.

ateska commented 3 years ago

Here are some hard numbers:

----------------------------------------------------------------
# 'jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          539051.85 EPS (  1.00)  1292.11 MB/s
* libpy_simdjson loads      375380.33 EPS (  1.44)   899.79 MB/s
* pysimdjson parse          362136.78 EPS (  1.49)   868.04 MB/s
* orjson loads              112062.53 EPS (  4.81)   268.61 MB/s
* python json loads          72665.18 EPS (  7.42)   174.18 MB/s
----------------------------------------------------------------

^ This illustrates the impact of the call (cysimdjson is Cython-based implementation). The native (C++) performance is 542339.05 EPS.

----------------------------------------------------------------
# 'jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         4414474.38 EPS (  1.00)    30.90 MB/s
* orjson loads             3698816.51 EPS (  1.19)    25.89 MB/s
* libpy_simdjson loads     1839016.53 EPS (  2.40)    12.87 MB/s
* pysimdjson parse         1015434.93 EPS (  4.35)     7.11 MB/s
* python json loads         526388.08 EPS (  8.39)     3.68 MB/s
----------------------------------------------------------------

^ This one zooms to this issue even more.

----------------------------------------------------------------
# 'jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson parse            2651.49 EPS (  1.00)  1674.46 MB/s
* libpy_simdjson loads        2445.90 EPS (  1.08)  1544.63 MB/s
* pysimdjson parse            2423.09 EPS (  1.09)  1530.22 MB/s
* orjson loads                 386.69 EPS (  6.86)   244.20 MB/s
* python json loads            294.36 EPS (  9.01)   185.89 MB/s
----------------------------------------------------------------
----------------------------------------------------------------
# 'jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson parse             289.98 EPS (  1.00)   652.76 MB/s
* pysimdjson parse             284.94 EPS (  1.02)   641.42 MB/s
* libpy_simdjson loads         278.46 EPS (  1.04)   626.82 MB/s
* orjson loads                  82.70 EPS (  3.51)   186.17 MB/s
* python json loads             22.69 EPS ( 12.78)    51.09 MB/s
----------------------------------------------------------------
----------------------------------------------------------------
# 'jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson parse             836.00 EPS (  1.00)  2782.05 MB/s
* pysimdjson parse             744.28 EPS (  1.12)  2476.84 MB/s
* libpy_simdjson loads         666.20 EPS (  1.25)  2217.00 MB/s
* orjson loads                 166.08 EPS (  5.03)   552.69 MB/s
* python json loads            113.87 EPS (  7.34)   378.93 MB/s
----------------------------------------------------------------
ateska commented 3 years ago

The related work has been released here: https://github.com/TeskaLabs/cysimdjson

TkTech commented 3 years ago

74 is now at a state where it can be used for your benchmarks, although a few tests still fail.

TkTech commented 3 years ago

Re-running your cysimdjson tests, we're now often at parity-or-better. We can definitely do better, but I'm happy with this for now as in exchange for a small difference in speed we are safer (prevent object reuse, prevent memory issues) and more capable (ex: buffer support).

----------------------------------------------------------------
# '/home/tktech/projects/cysimdjson/test/jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* pysimdjson parse         1255476.29 EPS (  1.00)  3009.38 MB/s
* cysimdjson parse         1235306.40 EPS (  1.02)  2961.03 MB/s
* cysimdjson pad parse     1211152.53 EPS (  1.04)  2903.13 MB/s
* orjson loads              207861.87 EPS (  6.04)   498.24 MB/s
* python json loads         135765.75 EPS (  9.25)   325.43 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# '/home/tktech/projects/cysimdjson/test/jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson pad parse        5947.56 EPS (  1.00)  3755.97 MB/s
* pysimdjson parse            5791.16 EPS (  1.03)  3657.20 MB/s
* cysimdjson parse            5568.33 EPS (  1.07)  3516.48 MB/s
* orjson loads                 764.81 EPS (  7.78)   482.99 MB/s
* python json loads            471.92 EPS ( 12.60)   298.02 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# '/home/tktech/projects/cysimdjson/test/jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson pad parse         593.03 EPS (  1.00)  1334.94 MB/s
* cysimdjson parse             554.87 EPS (  1.07)  1249.04 MB/s
* pysimdjson parse             552.20 EPS (  1.07)  1243.04 MB/s
* orjson loads                 152.71 EPS (  3.88)   343.75 MB/s
* python json loads             45.87 EPS ( 12.93)   103.26 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# '/home/tktech/projects/cysimdjson/test/jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson pad parse        1611.95 EPS (  1.00)  5364.29 MB/s
* cysimdjson parse            1262.62 EPS (  1.28)  4201.79 MB/s
* pysimdjson parse            1250.95 EPS (  1.29)  4162.94 MB/s
* orjson loads                 290.58 EPS (  5.55)   967.01 MB/s
* python json loads            220.79 EPS (  7.30)   734.76 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# '/home/tktech/projects/cysimdjson/test/jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         8896208.10 EPS (  1.00)    62.27 MB/s
* pysimdjson parse         7945949.11 EPS (  1.12)    55.62 MB/s
* orjson loads             7735180.97 EPS (  1.15)    54.15 MB/s
* cysimdjson pad parse     6078851.00 EPS (  1.46)    42.55 MB/s
* python json loads        1102638.34 EPS (  8.07)     7.72 MB/s
----------------------------------------------------------------
ateska commented 3 years ago

I haven't studies your implementation in detail but I would be super-useful (for us :-) ) if there is an official way how I can retrieve SIMDJSON C++ object (reference/pointer to that) from a Python wrapper when passed to other Cython code outside of this library. We frequently use Cython for acceleration and passing values from C++ in SIMDJSON thru pysimdjson and Python back to Cython represents an unnecessary yet significant performance hit. May I kindly ask if that bit has been in any form or shape a part of the design?

It would be wonderful to see it in the pysimdjson v4 b/c after that we can "merge/close" cysimdjson implementation ;-)

TkTech commented 3 years ago

@ateska do you have any small usage examples? Helps when adding a feature to see how it'll be used. This should probably be a new issue.

We can definitely do this easily with pycapsules and buffers.

ateska commented 3 years ago

I'll try to provide some ... in the new issue ;-) Thanks.

ateska commented 3 years ago

FYI: I managed to run the benchmark again, on the new branch of cysimdjson and I got this:

% PYTHONPATH=. python3 ./perftest/test_benchmark.py
----------------------------------------------------------------
# 'perftest/jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          638926.23 EPS (  1.00)  1531.51 MB/s
* cysimdjson pad parse      606547.25 EPS (  1.05)  1453.89 MB/s
* pysimdjson parse          606379.25 EPS (  1.05)  1453.49 MB/s
* python json loads          41720.92 EPS ( 15.31)   100.01 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# 'perftest/jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson pad parse        3304.32 EPS (  1.00)  2086.73 MB/s
* cysimdjson parse            2985.17 EPS (  1.11)  1885.18 MB/s
* pysimdjson parse            2906.61 EPS (  1.14)  1835.57 MB/s
* python json loads            204.97 EPS ( 16.12)   129.44 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# 'perftest/jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson pad parse         289.50 EPS (  1.00)   651.68 MB/s
* cysimdjson parse             281.92 EPS (  1.03)   634.63 MB/s
* pysimdjson parse             262.74 EPS (  1.10)   591.45 MB/s
* python json loads             19.49 EPS ( 14.85)    43.87 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# 'perftest/jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson pad parse         781.42 EPS (  1.00)  2600.42 MB/s
* cysimdjson parse             637.85 EPS (  1.23)  2122.65 MB/s
* pysimdjson parse             536.78 EPS (  1.46)  1786.31 MB/s
* python json loads             69.80 EPS ( 11.19)   232.30 MB/s
----------------------------------------------------------------

----------------------------------------------------------------
# 'perftest/jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         2605313.38 EPS (  1.00)    18.24 MB/s
* cysimdjson pad parse     2571813.54 EPS (  1.01)    18.00 MB/s
* pysimdjson parse         2312177.01 EPS (  1.13)    16.19 MB/s
* python json loads         436467.51 EPS (  5.97)     3.06 MB/s
----------------------------------------------------------------

I will dig a bit deeper and update this.