MrPigss / BetterJSONStorage

Better JSONStorage for tinyDB
https://pypi.org/project/BetterJSONStorage/
MIT License
33 stars 4 forks source link

.. image:: https://raw.githubusercontent.com/MrPigss/BetterJSONStorage/master/img/logo.png

Introduction


.. image:: https://codecov.io/gh/MrPigss/BetterJSONStorage/branch/master/graph/badge.svg?token=JN69A9GD3D :target: https://codecov.io/gh/MrPigss/BetterJSONStorage .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black .. image:: https://badge.fury.io/py/BetterJSONStorage.svg :target: https://badge.fury.io/py/BetterJSONStorage

BetterJSONStorage is a faster 'Storage Type' for TinyDB. It uses the faster Orjson library for parsing the JSON and BLOSC2_ for compression.

Parsing, compressing, and writing to the file is done by a seperate thread so reads don't get blocked by slow fileIO. Smaller filesizes result in faster reading and writing (less diskIO). Even Reading is all done from memory.

These optimizations result in much faster reading and writing without loss of functionality.

A goal for the BetterJSONStorage project is to provide a drop in replacement for the default JSONStorage.

An example of how to implement BetterJSONStorage can be found below. Anything else can be found in the TinyDB docs <https://tinydb.readthedocs.io/>_.

Database or JSON files created using other storage types (even the default one) are incompatible.

Installing BetterJSONStorage


Install BetterJSONStorage from PyPi <https://pypi.org/project/BetterJSONStorage/>_.

.. code-block:: PowerShell

pip install BetterJSONStorage

Usage


context Manager

.. code-block:: python

from pathlib import Path
from tinydb import TinyDB
from BetterJSONStorage import BetterJSONStorage

path = Path('relative/path/to/file.db')

with TinyDB(path, access_mode="r+", storage=BetterJSONStorage) as db:
    db.insert({'int': 1, 'char': 'a'})
    db.insert({'int': 1, 'char': 'b'})

.. _TinyDB: https://github.com/msiemens/tinydb .. _Orjson: https://github.com/ijl/orjson .. _BLOSC2: https://github.com/Blosc/python-blosc2

extra

one difference from TinyDB default JSONStorage is that BetterJSONStorage is ReadOnly by default. use access_mode='r+' if you want to write as well.

All arguments except for the storage and access_mode argument are forwarded to the underlying storage. You can use this to pass additional keyword arguments to orjson.dumps(…) method.

For all options see the orjson documentation <https://github.com/ijl/orjson#option>_.

.. code-block:: python

with TinyDB('file.db', option=orjson.OPT_NAIVE_UTC, storage=BetterJSONStorage) as db:

performance


See new performance numbers on the bottom. The entire test suite will be redone to be up to date but until that happens both the old (as they are more complete) as the new (as they are more comparable to modern hardware) will be kept in the readme.

The benchmarks are done on fixtures of real data:

data can be found here <https://github.com/serde-rs/json-benchmark/tree/master/data>_.

The exact same code is used for both BetterJSONStorage and the default JSONStorage. BetterJSONStorage is faster in almost* all situations and uses significantly less space on disk.

citm_catalog.json

.. list-table:: storage used :widths: 25 25 25 :header-rows: 1

canada.json

.. list-table:: storage used :widths: 25 25 25 :header-rows: 1

twitter.json

.. list-table:: storage used :widths: 25 25 25 :header-rows: 1

Random generated JSON

JSON has been generated on json-generator <https://app.json-generator.com/6R7FY2v7Bqvc>_. The generated JSON contains 140 items of about 0.7kb each. (100kb total) Every test was run 10 times and the average was taken.

init times: the time it takes to instantiate the db and storage: | BetterJSONStorage takes a bit more time to start but this only has to happen once in the beginning. | This was a tradeoff that made it possible for the fast reads and writes we see from BetterJSONStorage.

.. list-table:: avg init times :widths: 25 25 :header-rows: 1

insert time: the time it took to insert 140 items of around 0.7kb each: | Because BetterJSONStorage uses a seperate thread for writing, the main thread is not blocked. | This means no waiting for fileIO between subsequent writes. | BetterJSONStorage makes sure every thing is writen correctly.

.. list-table:: avg 140x 0,7kb insert :widths: 25 25 :header-rows: 1

read times: the time it took to read 140 items of around 0.7kb each: | All reading is done from memory and not from disk. | This means working with very large files can be an issue, | but if you're working on extremely large datasets TinyDB might also not be the right solution for you. | This also means reading is extremely fast. | Data in memory and on disk is always synced in the background so there should be no slowdown even with heavy writing in between reads.

.. list-table:: avg 140x 0.7kb reads :widths: 25 25 :header-rows: 1

Graph

This is the same data that has een used above poured into a nice excel graph.

.. image:: ./img/diff.png :width: 60%

New Performance Numbers

New tests were run on a 2021 MacBook Pro running Ventura 13.0.1 and python 3.10.9 .

Both reading and writing test are the same for both Better as default JSONStorage.

.. code-block:: bash

BetterJSONStorage:
    writing took: 71.001375ms
    reading took: 29.283583ms
Default JSONStorage:
    writing took: 7825.321125ms
    reading took: 19438.65975ms

Total:
    BetterJsonStorage: 240.7505ms
    default jsonStorage: 27264.555167ms

relative time vs BetterJSONStorage:
    BetterJSONStorage: 1x
    JSONStorage: 113.25x

The Benchmark shows that the default JSONStorage takes 113 times as long to finish as the BetterJSONStorage. Filesizes are also way bigger with 8.5MB for the default JSONStorage and only 373 KB for BetterJSONStorage.

To test the performance for yourself, run the tests/benchmark/citm.py file.