TkTech / pysimdjson

Python bindings for the simdjson project.
https://pysimdjson.tkte.ch
Other
643 stars 54 forks source link

simdjson.Parser() gets slow when iterating via json object #72

Closed smjure closed 3 years ago

smjure commented 3 years ago

First of all thank you for the great module. I'm new to json and am looking into fastest api due to high sizes of data to parse. I found out that simdjson.Parser() is super fast but when iterating over object, it gets pretty slow. Here is the example of my task with results:


import time
import orjson
import simdjson

parser = simdjson.Parser()

m = b'[{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1}]'
def benchmark0(name, loads):
    start = time.time()
    for _ in range(100_000):
        jsn = loads(m)
    print(name, F"{time.time() - start:2.3f}")

def benchmark1(name, loads):
    start = time.time()
    for _ in range(100_000):
        jsn = loads(m)
        for js in jsn:
            a = js['evl'] 
            # print(a)# do sth
    print(name, F"{time.time() - start:2.3f}")

print('\n======== deserialization only ==================')
benchmark0(F"{'orjson':>15}", orjson.loads)
benchmark0(F"{'simdjson_loads':>15}", simdjson.loads)
benchmark0(F"{'simdjson_parser':>15}", parser.parse) # NOTE: I tested it also in trading: on_message() but simdjson is slower than orjson; especially at for js in jsn, where jsn is simdjson obj which is read super slowly

print('\n======== deserialization + iteration over json obj (my type of task) ==============')
benchmark1(F"{'orjson':>15}", orjson.loads)
benchmark1(F"{'simdjson_loads':>15}", simdjson.loads)
benchmark1(F"{'simdjson_parser':>15}", parser.parse) # NOTE: I tested it also in trading: on_message() but simdjson is slower than orjson; especially at for js in jsn, where jsn is simdjson obj which is read super slowly

and results:

======== deserialization only ==================
         orjson 0.577
 simdjson_loads 0.713
simdjson_parser 0.172

======== deserialization + iteration over json obj (my type of task) ==============
         orjson 0.597
 simdjson_loads 0.741
simdjson_parser 1.364

What we see is that in the first case the parser is super fast (0.172 vs 0.577), but when we need to iterate via json object, it gets very slow (1.364 vs 0.597). Is there any workaround to preserve the speed of the simdjson.Parser()? Thank you

TkTech commented 3 years ago