exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
72 stars 39 forks source link

Allow orjson as a serialization framework #80

Closed hendrik-teuber-by closed 2 years ago

hendrik-teuber-by commented 3 years ago

https://github.com/ijl/orjson is probably the fastest json framework for python and maybe a good addition to this library.

littleK0i commented 3 years ago

Yes, it looks promising and well thought. I'll add it with next update, which will happen in the next 1-3 weeks.

Interestingly enough, I was thinking about removing json_lib option altogether, but kept it for backwards compatibility. Regardless of JSON lib, "normal" fetching is still relatively slow compared to HTTP transport.

In order to make "normal" fetching reasonably fast for large datasets, the following changes are required (in this order):

  1. Better protocol to transfer data from Exasol server (e.g. ARROW). JSON is very expensive for "big data".
  2. Parallel thread to fetch a few blocks of data in advance. Currently client fetches next block when previous block was fully depleted.
  3. Use better JSON parsing library.

So JSON lib definitely helps, but not as much as two other changes.

littleK0i commented 3 years ago

@hendrik-teuber-by , please check json_lib='orjson' connection option. It is now available in the latest version.

Changelog entries: https://github.com/exasol/pyexasol/blob/master/CHANGELOG.md#0230---2021-11-19 Commit: https://github.com/exasol/pyexasol/commit/450843d7c9b2652f811d3a218cfbcb95439fca8f