Closed GoogleCodeExporter closed 9 years ago
Wow. Exponential indeed. This slip could make it into school books.
Original comment by jim.andreou
on 24 Apr 2009 at 12:26
With some ugly hacks and some O(logn) space (n: depth of iterators), hasNext()
of a
concatenated iterator could be O(1) instead of O(logn) of the proposed fix
(which
probably matches the original intention of the concat() author), so the
complete
iteration should be O(E) (E: number of elements) instead of O(Elogn).
I find the latter issue highly unlikely to be ever fixed, so yeah, I agree that
too
many recursive concat() should be avoided. Users should strive to place as many
iterators in a single concat(), thus reducing the depth, or eliminating it by
placing
all iterators in a list and calling concat() only once on that.
Original comment by jim.andreou
on 24 Apr 2009 at 12:50
This is incredibly timely!
I'm developing an implementation of Version Space Algebra [1], making extensive
use
of Iterators to represent axiomatic hypothesis spaces (such as "some integer").
In
practice, the iterators on the leaves are small, however, these are combined in
a
tree to generate composite hypothesis spaces that represent the cross-product
of all
the components -- and the trees get arbitrarily deep. (The hypothesis spaces
are
narrowed quickly, and the full cross product is almost never actually used, but
there
will frequently be a lot of deeply concat'd iterators involved.)
Cutting to the chase: I will almost certainly need fast iterator concatenation.
Thanks for pointing this out!
[1] http://tlau.org/research/papers/mlj01-draft.pdf
Original comment by cresw...@gmail.com
on 24 Apr 2009 at 1:04
Thanks for this fix. It will be in RC2 (May).
Thankfully it was only 2 lines of code, so I'm not making you sign a contributor
agreement for it. :)
Original comment by kevin...@gmail.com
on 24 Apr 2009 at 4:29
I had a look at this again. Some points:
- current implementation is cubic (!) In particular, in the worst case,
iterating N elements from a recursive concat() can cause roughly:
(N-1)*N*(N+1)/6 + N^2 + N-2, i.e. O(N^3)
invocations to hasNext() of the Iterators#concat implementation (let alone the
necessary invocations to hasNext() of the actual iterator inputs). This grows
fast.
- in the perfectly balanced case, this is O(N*(logN)^2), which is not terrible,
though it's unlikely for anyone to ever constructed one of those (the usual
case is "push, via a concat(), an element to the front to the back",
repeatedly).
- This can be reduced to a linear number of hasNext() invocations: 2N + 1, i.e.
O(N)
But concat() iterators become a little fatter (an extra, constant amount of
memory is needed).
So, there is a trade-off:
- From O(N^3) down to O(N) complexity reduction
- O(1) extra memory per concat().
Should I pursue this? Is it a trade-off we're willing to make? To me, it looks
very one-sided to discard it as not worthwhile, but it will take some effort to
reduce the extra footprint (a stack is needed).
Original comment by jim.andreou
on 13 Apr 2012 at 1:17
Original issue reported on code.google.com by
pline...@gmail.com
on 23 Apr 2009 at 11:50