aviramha / ormsgpack

Msgpack serialization/deserialization library for Python, written in Rust using PyO3 and rust-msgpack. Reboot of orjson. msgpack.org[Python]
Apache License 2.0
239 stars 15 forks source link

High memory footprint in comparison to `msgpack` #279

Closed mspi92 closed 3 weeks ago

mspi92 commented 1 month ago

Issue

I switched to ormsgpack because the "OPT_NON_STR_KEYS" Feature came in very handy. However i noticed that my application needs way more Memory when using the "loads" Feature. This is especially worse (300%) when using "OPT_NON_STR_KEYS". Imo some memory overhead is ok (especially fixed offset), however it is a linear factor 3, significantly

Code Example

import os
import psutil

import msgpack
import ormsgpack

process = psutil.Process(os.getpid())

def get_rss_mem():
    rss = process.memory_info().rss
    return rss

def get_test_data():
    return {
        str(i): {
            str(j): {
                str(k): k for k in range(100)
            } for j in range(100)
        }
        for i in range(100)
    }

def run_test_case(dump_fn, load_fn):
    # initial memory
    rss_start = get_rss_mem()
    # test object
    obj = get_test_data()
    rss_obj = get_rss_mem()
    # dump object to msgpack
    val = dump_fn(obj)
    rss_dump = get_rss_mem()
    # load object from msgpack
    loaded_obj = load_fn(val)
    rss_load = get_rss_mem()
    # check if loaded object is the same as the original object
    assert loaded_obj == obj

    obj_size = rss_obj - rss_start
    dump_size = rss_dump - rss_obj
    load_size = rss_load - rss_dump

    print(f"Object size: {obj_size / 1024 / 1024:.2f} MB")
    print(f"Dump size: {dump_size / 1024 / 1024:.2f} MB")
    print(f"Load size: {load_size / 1024 / 1024:.2f} MB")

if __name__ == "__main__":
    """ run test cases individually to not cross pollute results """
    run_test_case(ormsgpack.packb, ormsgpack.unpackb)
    # > run_test_case(ormsgpack.packb, ormsgpack.unpackb)
    # > Object size: 94.12 MB
    # > Dump size: 4.17 MB
    # > Load size: 56.12 MB

    run_test_case(ormsgpack.packb, lambda val: ormsgpack.unpackb(val, option=ormsgpack.OPT_NON_STR_KEYS))
    # > run_test_case(ormsgpack.packb, lambda val: ormsgpack.unpackb(val, option=ormsgpack.OPT_NON_STR_KEYS))
    # > Object size: 94.12 MB
    # > Dump size: 4.16 MB
    # > Load size: 107.00 MB

    run_test_case(msgpack.packb, msgpack.unpackb)
    # > run_test_case(msgpack.packb, msgpack.unpackb)
    # > Object size: 94.12 MB
    # > Dump size: 3.79 MB
    # > Load size: 32.12 MB

Infos

Issue occurs with Python 3.11.9 and ormsgpack==1.5.0

exg commented 4 weeks ago

The difference in memory usage is because of dictionary keys. Your test data has 100^3 dictionary keys from a domain of 100 elements (the strings "0", "1", ..., "99"). msgpack interns dictionary keys and is therefore able to reuse all the key strings that occur multiple times. With your test data, it creates 100 key strings in total, which is optimal. ormsgpack does not intern strings; instead, when OPT_NON_STR_KEYS is not specified, it maintains an in-memory fixed-size cache of map keys in order to reuse the key strings. Due to hash collisions, the reuse is however not 100%. When OPT_NON_STR_KEYS is specified, the cache is not used and there is no reuse. This explains your results. Is the test data artificial or does your application process maps with similar characteristics?

I will look into improving the cache and enabling it also when OPT_NON_STR_KEYS is specified.

mspi92 commented 3 weeks ago

Hello, Thanks for the quick reply. The test data is artificial, but showed similar behavior as the productive code. Actually our productive code uses integer keys in the second level (str(j)) but to allow for a fair comparison between msgpack and ormsgpack i made it a string cast. We encountered a performance issue in msgpack during (int->str, str->int) key cast. strict_map_key=False in msgpack doesn't do the trick here. We dropped in ormsgpack which removes the performance issue by removing this extra cast step. However we traded performance for memory consumption. With your fix the memory halved and it is far more reasonable :) Thank you for that +1