eclipse / eclipse-collections

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.
https://eclipse.dev/collections/
2.44k stars 615 forks source link

Compact() method for maps doesn't check if a rehash is needed #1668

Open brunnsbe opened 2 months ago

brunnsbe commented 2 months ago

In for example LongDoubleHashMap the compact() method directly calls: this.rehash(this.smallestPowerOfTwoGreaterThan(this.size()));

This rehashes also the maps in those cases where the map cannot be compacted. It would be better to first check if this.smallestPowerOfTwoGreaterThan(this.size()) != getTableSize() too quickly see if we can compact the map or not.

mohrezaei commented 2 months ago

The only reason to call compact() is because the map has undergone some removal. In other words, the code is expected to look like:

map.remove(k1);
map.remove(k2);
... // maybe a loop to remove stuff
//at the end:
map.compact();

If you're calling compact otherwise, it would be odd.

If a map has undergone removal, compact can not only change the size, but also remove sentinels (aka "tombstones").

This is how something is removed:

        this.keysValues[index] = REMOVED_KEY;
        this.keysValues[index + 1] = EMPTY_VALUE;

The map does keep track of occupiedWithData and occupiedWithSentinels, which in theory could be used to make compact a bit smarter (your check with size is insufficient), but it would be good to understand how you're trying to use compact that this in an issue.

brunnsbe commented 2 months ago

Thank you for your swift reply! In my use case I'm using compact after removing items from a map, my idea was that it's unnecessary work to rehash everything an recreate the internal arrays unless their length change. The maps I'm building can be quite enormous, with over one billion of elements so I'm interested in compacting them if possible, but if the map's internal arrays still stay the same length the compact call doesn't give much of a value. 🤔

brunnsbe commented 2 months ago

After investigating the compact() behavior further it seems that it does unnecessary work. Let's say that I have a map with the size 2916 and the current length of its internal arrays as 16384. In compact() it first calls:

this.rehash(this.smallestPowerOfTwoGreaterThan(this.size()));

The this.smallestPowerOfTwoGreaterThan(this.size()) returns 4096. Inside rehash(int newCapacity) it calls this.smallestPowerOfTwoGreaterThan(this.size()) which changes the internal arrays to be of size 4096. Then the code starts copying the values to the new arrays and inside addKeyValueAtIndex(long key, V value, int index) there is a call to maxOccupiedWithData(), which for 4096 returns 2048. This leads to the addKeyValueAtIndex() eventually calling this.rehashAndGrow(); which grows the internal arrays to the length of 8192.

Would it be possible that with the compact() call not do the extra grow from 4096 to 8192 as one could assume that after calling compact() that the map won't be modified anymore. The other option would be to initially in compact set it to 8192 (using e.g. smallestPowerOfTwoGreaterThan(this.fastCeil(size << 1))) and avoid the extra rehash.

mohrezaei commented 2 months ago

Good catch. That should be fixed, and the optimization regarding compact only if size changes or there are sentinels can be applied as well.

Do you still need another function that only compacts if there is a memory benefit?

brunnsbe commented 2 months ago

Thank you!

Do you still need another function that only compacts if there is a memory benefit?

How would that differ from "and the optimization regarding compact only if size changes or there are sentinels can be applied as well"? Or do you mean that compact() always would try to compact() and we would have a separate isCompactable() (horrible name) or something similar? Yes, a method like that would be useful unless compact internally would call it.