goldmansachs / gs-collections

GS Collections has been migrated to the Eclipse Foundation, re-branded as Eclipse Collections. https://www.eclipse.org/collections/
https://www.eclipse.org/collections/
1.81k stars 275 forks source link

load factor support in the map #22

Closed sushantk closed 9 years ago

sushantk commented 9 years ago

Hi,

We have been using gs-collection in our project and wanting to control the size requirement for several small maps. Apparently all the primitive maps don't support the load factor and always assumes 0.5. Could you clarify if this is something intended or can be provided?

Regards, Sushant

goldmansachs commented 9 years ago

Based on performance and memory tests for primitive maps and sets, we have consistently seen that 0.5 is a good load factor.

The memory footprint of the map per entry is given by the formula: (size of pay load) * 1.5/L + C / (# of entries) where L is load factor and C is a constant. The 1.5 / L term comes from the fact that the cost per entry is 1 / L just before resizing and 2 / L just after resizing so (1/L + 2/L) * 1/2 is the average cost per entry.

In the large entry limit, the final term (C / # of entries) is ignorable, which leads us to (size of payload) * 1.5/L. This is 12 bytes per int in an IntHashSet, which compares very favorably with say, java.util.HashSet<Integer>, which has an average entry size of 56 bytes.

In the small entry limit, which is where your concern is coming from, the second term dominates. So changing L won’t help.

In the small limit, there are two possible solutions to achieving lower memory footprint.