Closed mspi92 closed 3 weeks ago
The difference in memory usage is because of dictionary keys. Your test data has 100^3 dictionary keys from a domain of 100 elements (the strings "0", "1", ..., "99"). msgpack interns dictionary keys and is therefore able to reuse all the key strings that occur multiple times. With your test data, it creates 100 key strings in total, which is optimal. ormsgpack does not intern strings; instead, when OPT_NON_STR_KEYS
is not specified, it maintains an in-memory fixed-size cache of map keys in order to reuse the key strings. Due to hash collisions, the reuse is however not 100%. When OPT_NON_STR_KEYS
is specified, the cache is not used and there is no reuse. This explains your results. Is the test data artificial or does your application process maps with similar characteristics?
I will look into improving the cache and enabling it also when OPT_NON_STR_KEYS
is specified.
Hello,
Thanks for the quick reply. The test data is artificial, but showed similar behavior as the productive code.
Actually our productive code uses integer keys in the second level (str(j)
) but to allow for a fair comparison between msgpack and ormsgpack i made it a string cast.
We encountered a performance issue in msgpack during (int->str, str->int) key cast. strict_map_key=False
in msgpack doesn't do the trick here.
We dropped in ormsgpack which removes the performance issue by removing this extra cast step. However we traded performance for memory consumption. With your fix the memory halved and it is far more reasonable :)
Thank you for that +1
Issue
I switched to ormsgpack because the "OPT_NON_STR_KEYS" Feature came in very handy. However i noticed that my application needs way more Memory when using the "loads" Feature. This is especially worse (300%) when using "OPT_NON_STR_KEYS". Imo some memory overhead is ok (especially fixed offset), however it is a linear factor 3, significantly
Code Example
Infos
Issue occurs with
Python 3.11.9
andormsgpack==1.5.0