Possible optimization of FluentIterable.transform

GoogleCodeExporter commented 9 years ago

FluentIterable.transform delegates its internal iterable to the 
Iterables.transform method. This method has the disadvantage that it wraps any 
kind of collection as an iterable. So when the following is invoked:

FluentIterable.from(someList).transform(someFunction).toList()

ImmutableList.copyOf(iterable) falls back to less efficient creation by 
iteration - instead of creating a fixed-size array as it would do if the 
transformed internal iterable was still a collection.

This would easily be changed if FluentIterable.transform just checks if its a 
Collection first and then invokes 
FluentIterable.from(Collections2.transform(iterable, function).

This optimization applies only to FluentIterables which are just transformed 
and not filtered. But its a minimalistic change which should not have any real 
disadvantages except one more line of code and allows for more efficient 
creation of the resulting datastructures.

Original issue reported on code.google.com by Eric6i...@googlemail.com on 14 May 2013 at 6:53

GoogleCodeExporter commented 9 years ago

It might also be worthwhile to check for List and then use Lists.transform.
In the common case of a RandomAccess List, toArray will skip using the iterator.

Original comment by Eric6i...@googlemail.com on 14 May 2013 at 7:31

GoogleCodeExporter commented 9 years ago

"In the common case of a RandomAccess List, toArray will skip using the 
iterator."

Well, that one does not work out, since AbstractList does not optimize toArray 
in the case of RandomAcessLists. Therefore, just checking for collection should 
be sufficient.

Original comment by Eric6i...@googlemail.com on 15 May 2013 at 9:01

GoogleCodeExporter commented 9 years ago

I know there is API documentation for Iterables#transform and 
Iterables#partition, but I have run into several places where this optimization 
would be useful to avoid copying where I have an API that accepts Iterable and 
internally apply Iterables.transform, but is frequently given a collection 
(often an Immutable*).

It might also be beneficial to have Iterables.transform dynamically dispatch 
transform to the most appropriate and optimized implementation for the 
specified iterable:

    public static <F, T> Iterable<T> transform(final Iterable<F> fromIterable,
                                               final Function<? super F, ? extends T> function) {
        checkNotNull(fromIterable);
        checkNotNull(function);
        if (fromIterable instanceof List) {
            return Lists.transform((List<F>) fromIterable, function);
        } else if (fromIterable instanceof Collection) {
            return Iterables2.cast(Collections2.transform((Collection<F>) fromIterable, function));
        } else {
            return Iterables.transform(fromIterable, function);
        }
    };

Original comment by schlo...@gmail.com on 12 Oct 2013 at 6:13

GoogleCodeExporter commented 9 years ago

There was some previous discussion of this in 
http://code.google.com/p/google-collections/issues/detail?id=207 ... but that 
focused on Iterables.size, not ImmutableList.copyOf and FluentIterable.toList, 
more commonly used methods that I find more compelling.

Note that this request is (I think) distinct from changing Iterables.transform 
itself (as described in comment #3), as the FluentIterable would remain a 
"plain" Iterable and not a Collection or List, so there is no change to its 
equals() behavior. Here's the (irrelevant here, again, I think) comment that 
Kevin once made about changing Iterables.transform (*not* 
FluentIterable.transform) to delegate to Lists.transform when appropriate:

"""
So, right now every method of Iterables that returns an Iterable consistently 
returns a "plain Iterable", and every method of Collections2 returning a 
Collection returns a "plain Collection."  I would prefer not to step away from 
that without having a clearer picture of where the ultimate destination we want 
to end up at is... and clear evidence that this is all definitely worthwhile.
"""

Also, I've split the AbstractList.toArray issue off into 
http://code.google.com/p/guava-libraries/issues/detail?id=1556

Original comment by cpov...@google.com on 16 Oct 2013 at 7:05

Added labels: Package-Collect, Type-Performance

GoogleCodeExporter commented 9 years ago

I'd strongly agree to change only FluentIterable and NOT Iterables, as the 
latter clearly suggests to return an Iterable and nothing else. Type-Dispatch 
as a means of internal optimization should not pollute the expected result of a 
method.

FluentIterable, however, keeps the base collection on the inside (brilliant 
design choice!), so here a full dispatch for Collection and List can be done 
without changing anything on the outside.

Obviously I'm also in favor for the other ticket.

Original comment by Eric6i...@googlemail.com on 17 Oct 2013 at 10:41

GoogleCodeExporter commented 9 years ago

I'd say that the semantics should not change at all; the only user-visible 
change should be the speed in case the underlying Iterable happens to be a 
Collection. This seems to be doable for many methods including size, toArray, 
and toList. As Iterables uses FluentIterable internally, it should profit from 
this too. I've got some proof of concept already and will post it when tested 
well.

Original comment by Maaarti...@gmail.com on 17 Oct 2013 at 3:33

GoogleCodeExporter commented 9 years ago

I added a method using the size of the underlying iterable to the result of 
Iterables.unmodifiable, .transform, .skip, and .limit. In case they're based on 
a Collection (or another such size-aware Iterable), `Iterables.size` is nearly 
as fast as `size` of the underlying collection. This also allows to avoid 
resizing in toArray and toList.

The speed up is nice:

https://microbenchmarks.appspot.com/runs/b0538106-18fa-46ea-a8b4-18424a4707b8,48
49c48d-4403-46e0-b895-2b000d194f74#r:scenario.benchmarkSpec.parameters.iterableS
etup,scenario.benchmarkSpec.parameters.baseSetup,scenario.benchmarkSpec.paramete
rs.state&c:scenario.benchmarkSpec.methodName

Original comment by Maaarti...@gmail.com on 20 Oct 2013 at 1:36

Attachments:

GoogleCodeExporter commented 9 years ago

Wouldn't it be simpler to put the size implementation of Iterables into the 
FluentIterable and implement Iterables.size by
FluentIterable.from(iterable).size()?

That could be done with most of the iterables methods.
The advantage of re-routing most implementations through fluentiterable is that 
it automatically checks for fluentiterable and "unpacks" the target collection 
whenever needed by the next method. However, the additional class-checks might 
make this optimization slower in many usecases:
All optimizations are worthless as soon as the first filtering Predicate is 
used in the chain, which enforces the use of an iterable. And a filtering might 
be the most common operation of them all.

Original comment by Eric6i...@googlemail.com on 20 Oct 2013 at 4:32

GoogleCodeExporter commented 9 years ago

This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:12

Added labels: MigratedToGitHub

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

Changed state: Migrated

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08

Added labels: Restrict-AddIssueComment-Commit

DaveAKing / guava-libraries

Possible optimization of FluentIterable.transform #1413