clauchiorean / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

Lightweight Map/Set #189

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
A new map that is backed by a list only.

The contains method would just iterate over all entries (same for get/put).
It's not efficient for larger maps - but saves memory for maps that are
really small (< 10 entries) - which is very often the case.
The same holds for Sets.

Ideally, one would like to have method

Sets.newImmutableSet(Iterable<? extends E> elements) that would pick a type
for you - either lightweight set or hashset (same for map)

Original issue reported on code.google.com by zeitlin...@gmail.com on 10 Jun 2009 at 7:58

GoogleCodeExporter commented 9 years ago
Is your application choking on the memory usage of too many small maps and sets?

Original comment by kevin...@gmail.com on 10 Jun 2009 at 3:41

GoogleCodeExporter commented 9 years ago
I haven't really profiled for that - that's a valid point.
But often I use a List instead of a Set if I have only a couple of values - 
because
it's more lightweight - maybe that's premature optimization.
Having it in a library is a very seemless optimization, though.

Original comment by zeitlin...@gmail.com on 10 Jun 2009 at 6:54

GoogleCodeExporter commented 9 years ago
My app generates heaps of micro (1-3 key-value pair) maps and throws them away 
soon.
Switching to lightweight versions did cut on GC frequency quite a bit.

Original comment by earwin@gmail.com on 18 Jul 2009 at 7:54

GoogleCodeExporter commented 9 years ago
Marking "Accepted/post-1.0", but that's not a promise. It remains to be seen if 
this 
is beneficial enough and important enough, and would be widely-enough used.

What about the names LinearSet and LinearMap?  Should drive home what the 
devil's 
bargain you're making really is.

Any API that chooses for you whether to use this or a conventional set sounds 
infeasible to me; it won't know your memory situation or how slow your 
hashCode() 
method is, etc.

Original comment by kevin...@gmail.com on 12 Aug 2009 at 6:17

GoogleCodeExporter commented 9 years ago
We here have MicroMap, which stores a single key/value right on itself, and 
MiniMap -
which keeps a single array split in two for keys and values.
MicroMap was later extended with template-generated versions for more than one
key/value pair. Which one is instantiated is determined in factory method.

Original comment by earwin@gmail.com on 12 Aug 2009 at 11:46

GoogleCodeExporter commented 9 years ago
I don't think you need to know the memory situation - the default capacicty for
HashMap is fixed to 10, too. People will just know that this is the behaviour. 
You
could also choose a limit, such as 

Map Maps.newMap(Map map);
Map Maps.newMap(Map map, int limit);
Set Sets.newSet(Set set);
Set Sets.newSet(Set set, int limit);
(type parameters left out)

I was hoping that you wouldn't need an exposed type name - as in
Collections.singleton. MiniMap would be fine - except if it would be immutable -
which would be good.

Original comment by zeitlin...@gmail.com on 15 Aug 2009 at 8:40

GoogleCodeExporter commented 9 years ago
For immutable collections, we can improve the performance of the existing 
Immutable*
classes without defining any new public methods or classes. For example, 
ImmutableSet
and ImmutableMap already have separate implementations for sizes 0 and 1; we 
could
add size-2 and size-3 implementations. 

Then performance benefits will appear automatically, without any modifications 
to the
calling code.

Original comment by jared.l....@gmail.com on 20 Aug 2009 at 12:16

GoogleCodeExporter commented 9 years ago
Yes, that would be a good idea.
At first I thought that the user should make a deliberate choice to use the 
lightweight version or at least that it 
can be possible (Maps.newMap) - but it's probably safe to use a lightweight 
version up to 5 or 10.

Original comment by zeitlin...@gmail.com on 21 Aug 2009 at 7:46

GoogleCodeExporter commented 9 years ago
The problem with immutables is - you have two ways of creating them, and both 
are
flawed in relation to micromaps/sets:
ImmutableXXX.of() - you can't decide the size of your XXX at runtime
ImmutableXXX.builder() - a temporary collection is created inside the builder 
to hold
the future immutable's contents, so you're bumping up your GC traffic instead of
lowering it.

Okay, if you keep these collections around for extended time, you still win
memory-wise, but that's not covering my usecase. Probably can't be covered in a 
clean
way.

Original comment by earwin@gmail.com on 21 Aug 2009 at 8:04

GoogleCodeExporter commented 9 years ago
zeitlin, once you reach 5 or more elements, the performance impact becomes 
ambiguous.
A hash lookup would probably be faster than a linear search, but the collection 
would
take longer to generate and would require more memory. 

earwin, I don't understand your objection. No matter what the implementation, 
if you
want to decide the collection size at runtime, the code will need to either 
create a
temporary collection or receive the entire collection as an input parameter 
(like the
copyOf methods already do).

Original comment by jared.l....@gmail.com on 21 Aug 2009 at 8:12

GoogleCodeExporter commented 9 years ago
You can create a collection of certain size and then fill it up. Need to track
current position somehow - either in the builder or inside collection itself. 
And
it's not quite immutable - I said it can't be done in a clean manner.

Original comment by earwin@gmail.com on 22 Aug 2009 at 6:09

GoogleCodeExporter commented 9 years ago
agreed - more than 5 elements is too much.

Regarding the builder: the problem could be solved if you could give the 
expected
size to the builder. Then, no set/map would need to be copied.

Original comment by zeitlin...@gmail.com on 22 Aug 2009 at 6:20

GoogleCodeExporter commented 9 years ago
zeitlin, your suggestion would work. The question is whether we should add a 
new public 
method just to improve performance slightly. In any case, we should tune the 
existing 
APIs first.

Original comment by jared.l....@gmail.com on 22 Aug 2009 at 9:13

GoogleCodeExporter commented 9 years ago
that's a good question - I don't know.
Some people claim that it saved them hundres of megabytes (using lists lists 
with
defined capacity, lists instead of maps, etc).

Original comment by zeitlin...@gmail.com on 24 Aug 2009 at 7:28

GoogleCodeExporter commented 9 years ago
Such cases exist, but are almost always nontrivial. I bet every person hitting 
it
wants to be absolutely sure proper minicollections are used. If you hide
minicollections behind a general-use factory, and won't provide guarantees in
JavaDocs that things will always remain that way (optimized for size), they can 
no
longer be 'sure'.

Original comment by earwin@gmail.com on 24 Aug 2009 at 10:22

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:45

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:57

GoogleCodeExporter commented 9 years ago
This issue has been moved to the Guava project (keeping the same id number). 
Simply replace 'google-collections' with 'guava-libraries' in your address 
bar and it should take you there.

Original comment by kevinb@google.com on 5 Jan 2010 at 11:09