TimurMahammadov / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

Python-style zip and enumerate methods #35

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Two Iterator/Iterable/Collection related methods I seem to be
reimplementing all the time in various languages are zip and enumerate from
Python.

Here's their documentation from CPython 2.5. zip:

    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

And enumerate:

   enumerate(iterable) -> iterator for index, value of iterable

   Return an enumerate object.  iterable must be an other object that supports
   iteration.  The enumerate object yields pairs containing a count (from
   zero) and a value yielded by the iterable argument.  enumerate is useful
   for obtaining an indexed list: (0, seq[0]), (1, seq[1]), (2, seq[2]), ...

I've implemented these in a MIT-licensed library for Java, JIterTools, at
http://www.juripakaste.fi/jitertools/ . They operate on Iterators and
Iterables. There's also a variant of zip called zipFill that goes on as
long as there are items in one of the Iterables/Iterators, reading extra
values for exhausted Iterators from an associated function. 

I'd love to see all of them/some of them/something like them included in a
well-maintained library of various Collection related things and Google
Collections looks like a prime candidate as long as Commons Collections is
inactive. I'm not particular about the exact details, though. Of the
methods I've implemented, zip is the cleanest with no extra classes needed
for parameters or return values. Both enumerate and zipFill need helper
classes.

Original issue reported on code.google.com by juri.pak...@gmail.com on 4 Nov 2007 at 12:27

GoogleCodeExporter commented 9 years ago
Thanks for the suggestion! They both seem like plausible additions to the 
library.

The methods Iterators.pairUp() and Iterables.pairUp() provide the same 
functionality
as zip, for the case when you have two iterators or iterables. How often would 
people
want to zip more than two?

There would definitely be uses for enumerate(). In my code, at least, there are 
many
cases when I increment a counter within a for loop. An enumerate() command 
would make
the code more clean and would eliminate a possible bug if someone calls continue
inside the loop.

Original comment by jared.l....@gmail.com on 4 Nov 2007 at 4:31

GoogleCodeExporter commented 9 years ago
It looks to me like we aren't supplying Iterators.pairUp() and 
Iterables.pairUp(),
presumably because of the cleanup that Pair still needs.  I hope our goal is to
eventually work them in.  To me, the case of two objects of different types 
seems
more useful than the case of an arbitrary number of objects of the same type, 
so I'm
not sure I'd use zip() in addition to pairUp().

enumerate() I do believe would be useful.  It looks like we aren't including
CountingIterator, either, which is fine with me, since it requires you to keep a
reference to the Iterator, meaning that it won't work in a foreach.  Your
enumerate(), of course, doesn't have this problem.  I've written versions of
enumerate() before using a Pair<Integer, T>, but I think I may like your 
approach of
a special object better.  getIndex() is clearer than getFirst() (or is it
getSecond()?  Users shouldn't have to work about ordering like they do with the 
Pair
approach).

Original comment by cpov...@google.com on 4 Nov 2007 at 5:43

GoogleCodeExporter commented 9 years ago
I didn't see pairUp in the API docs, I guess it's new?

Looking at my own code and the Python libraries I've installed here, using just 
two
arguments for zip looks like the common case, although using more than two does
happen. There's attractive genericity to supporting an arbitrary number of 
iterators,
but supporting just two does free you from having to use Collections instead of 
a
more specialized object with named members. I'd go (and have gone) for the more
generic approach, but it's not a clear-cut case. You could always construct a 
tree of
paired up iterators, but that might get hairy rather quickly, with an unreadable
forest of <<>>s and type names miles long.

As for Enumerate, yes, the main thing I dislike about Python's enumerate is 
that I
always have to check the documentation about the order of the things in the 
tuple.
Giving the parts clear names beats that.

One issue I have with my enumerate method is the name - there's potential for 
name
confusion with the old school Enumerations. It does feel like the obvious name,
though, at least to me.

How do you guys feel about zipFill? It's probably less commonly useful, but I 
did use
it just last week when I was outputting side by side sets of objects and wanted 
to 
consume all the data from both iterators with just one foreach.

Original comment by juri.pak...@gmail.com on 5 Nov 2007 at 8:25

GoogleCodeExporter commented 9 years ago
yeah, Pair and Itera*s.pairUp() need to get integrated out from our internal 
codebase
still.

will comment on the rest of the stuff in this bug tomorrow.

Original comment by kevin...@gmail.com on 5 Nov 2007 at 8:28

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:57

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 6:02

GoogleCodeExporter commented 9 years ago
This issue has been moved to the Guava project (keeping the same id number). 
Simply replace 'google-collections' with 'guava-libraries' in your address 
bar and it should take you there.

Original comment by kevinb@google.com on 5 Jan 2010 at 11:09