eclipse / eclipse-collections

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.
https://eclipse.dev/collections/
2.44k stars 612 forks source link

Add ensureCapacity() to mutable HashSets and HashMaps #1197

Open syskin345 opened 2 years ago

syskin345 commented 2 years ago

Initial capacity can be set when creating a new map or set, and for good reason. But sometimes the required capacity is only known later. Proposal: add a method ensureCapacity, similar to what Lists already have, which resizes if necessary.

Additionally, use those methods internally in putAll implementations, as they're a perfect example of when it's useful to have them (when adding N entries, it's guaranteed we'll need the capacity of at least N).

Additionally, HashSet:addAll could use it if the Collection added is also a Set.

mohrezaei commented 2 years ago

ensureCapacity on map and set are a bit tricky to use, because the keys are deduped. In fact, I see this kind of code all the time:

public List dedupe(List list)
{
    Set s = new UnifiedSet();
    s.addAll(list);
    return FastList.newList(s);
}

If we change addAll to ensureCapacity(list.size()), it can potentially over allocate by a lot, depending on the duplication in the list.

The constructors, as you mentioned do pre-size. In the above code, using new UnifiedSet(list) would potentially cause too much allocation. So currently, a knowledgeable user can do both: use constructor to presize and addAll not to.

I'm not saying the methods shouldn't be added, but their usage is limited to fewer scenarios than list.ensureCapacity. I've often wondered whether I would use set.ensureCapacity for a set that was not empty, and on balance, opted it wasn't sensible.

syskin345 commented 2 years ago

Agreed, as mentioned in the original proposal, addAll() should only internally used it if the incoming collection is a Set.

For any scenario when we're not 100% sure, we just let the user decide yo all it - maybe they expect the data to be unique, maybe not.

I agree those methods are less useful than for List, but I actually have a performance-sensitive use case where, if I use Koloboke's primitive map, it's massively faster thanks to its ensureCapacity()...