First of all thank you for the great module. I'm new to json and am looking into fastest api due to high sizes of data to parse. I found out that simdjson.Parser() is super fast but when iterating over object, it gets pretty slow. Here is the example of my task with results:
import time
import orjson
import simdjson
parser = simdjson.Parser()
m = b'[{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1}]'
def benchmark0(name, loads):
start = time.time()
for _ in range(100_000):
jsn = loads(m)
print(name, F"{time.time() - start:2.3f}")
def benchmark1(name, loads):
start = time.time()
for _ in range(100_000):
jsn = loads(m)
for js in jsn:
a = js['evl']
# print(a)# do sth
print(name, F"{time.time() - start:2.3f}")
print('\n======== deserialization only ==================')
benchmark0(F"{'orjson':>15}", orjson.loads)
benchmark0(F"{'simdjson_loads':>15}", simdjson.loads)
benchmark0(F"{'simdjson_parser':>15}", parser.parse) # NOTE: I tested it also in trading: on_message() but simdjson is slower than orjson; especially at for js in jsn, where jsn is simdjson obj which is read super slowly
print('\n======== deserialization + iteration over json obj (my type of task) ==============')
benchmark1(F"{'orjson':>15}", orjson.loads)
benchmark1(F"{'simdjson_loads':>15}", simdjson.loads)
benchmark1(F"{'simdjson_parser':>15}", parser.parse) # NOTE: I tested it also in trading: on_message() but simdjson is slower than orjson; especially at for js in jsn, where jsn is simdjson obj which is read super slowly
and results:
======== deserialization only ==================
orjson 0.577
simdjson_loads 0.713
simdjson_parser 0.172
======== deserialization + iteration over json obj (my type of task) ==============
orjson 0.597
simdjson_loads 0.741
simdjson_parser 1.364
What we see is that in the first case the parser is super fast (0.172 vs 0.577), but when we need to iterate via json object, it gets very slow (1.364 vs 0.597). Is there any workaround to preserve the speed of the simdjson.Parser()? Thank you
The Parser() interface is used to avoid work. If you're just going to loop over everything it will always be slower, since it's creating proxy objects.
orjson is sometimes just faster than pysimdjson, probably because of how horrible element_to_primitive is.
Look at the timeit module, don't try doing this microbenchmarks yourself.
First of all thank you for the great module. I'm new to json and am looking into fastest api due to high sizes of data to parse. I found out that simdjson.Parser() is super fast but when iterating over object, it gets pretty slow. Here is the example of my task with results:
and results:
What we see is that in the first case the parser is super fast (0.172 vs 0.577), but when we need to iterate via json object, it gets very slow (1.364 vs 0.597). Is there any workaround to preserve the speed of the simdjson.Parser()? Thank you