Closed thetumbled closed 6 days ago
@thetumbled Please add the following content to your PR description and select a checkbox:
- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
HashMap<LongLongPair, Long>
btw. In Fastutil, there's also Obj2LongMap interface which would be applicable in this case when the value is a long
, for example using Object2LongOpenHashMap implementation. In Object2LongOpenHashMap, there's a trim
method to reduce the size. I guess the benefit of ConcurrentLongLongPairHashMap is that it has the auto shrink feature.
's also Obj2LongMap interface which would be applicable in this case when the value is a
long
, for example using Object2LongOpenHashMap implementation. In Object2LongOpenHashMap, there's atrim
method to reduce
No, there is no shrink logic triggerd in the test code, as i only add new item into the map, without any deletion. Shrinking logic is triggered by item deletion.
The reason why ConcurrentLongLongPairHashMap
is space efficient is that it use open hash addrressing with linear probing, which require less space to implement, while HashMap
require more space to implement the data structure, and there is no any wrapper in ConcurrentLongLongPairHashMap
.
As for Object2LongOpenHashMap, i guess it take up more space than ConcurrentLongLongPairHashMap
too, as it use wrapper. There is no any wrapper in ConcurrentLongLongPairHashMap
.
Motivation
Negative ack feature need to retain the message id and timestamp info in the memory of the consumer client side, leading to great memory consumption. This PR aim to replace the
HashMap
with the inner map implementationConcurrentLongLongPairHashMap
to reduce the memory consumption. ThoughHashMap
is faster than the inner map implementationConcurrentLongLongPairHashMap
in some cases, but the most important issue in this case is memory consumption instead of the speed.Some test data list as follows:
experiment 1
HashMap:178Mb ConcurrentLongLongPairHashMap:64Mb
HashMap:566Mb ConcurrentLongLongPairHashMap:256Mb
HashMap:1132MB Approximately each entry consume 1132MB/10000000=118byte.
ConcurrentLongLongPairHashMap:512MB Approximately each entry consume 512MB/10000000=53byte.
With this improvement, we can reduce 50+% of the memory consumption!
experiment 2
Test three candidate data structures:
Test code:
The results list as follows:
Conclusion: HashMap<LongPair, Long> 91MB HashMap<LongLongPair, Long> 114MB ConcurrentLongLongPairHashMap 64MB
It shows that the
ConcurrentLongLongPairHashMap
is still the best option to store enormous amount of entries.Modifications
Replace
HashMap
withConcurrentLongLongPairHashMap
in Negative Ack Tracker.Verifying this change
(Please pick either of the following options)
This change is already covered by existing tests, such as (please describe tests).
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/thetumbled/pulsar/pull/63