cllanjim / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

Iterators.size(Iterator): counting collection elements must not overflow #254

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
According to the contract of the Collection#size(), if the number of
elements in a collection exceeds Integer.MAX_VALUE, then Collection#size()
returns Integer.MAX_VALUE. Some of your collections potentially break this
contract, when they delegate to Iterators.size() to count the elements,
since Iterators.size() ignores an overflow during counting.

This is a rare use-case, since it requires a collection over 2147483647+
elements, which (if distinct objects) will most certainly not even fit into
the memory. However, there might be collection views over more than
Integer.MAX_VALUE elements (e.g. multisets and dynamic or remote
collections), which may implement size() accordingly. Wrapping such
collections into some of your Iterable- or Collection-views such as
Collections2.filter() will effectively circumvent a call to size() and
eventually result in the size being (mis)computed by Iterators.size(),
which may return an overflow/negative value when it shouldn't.

I suggest something like this for the Iterators.size() implementation:

public class Iterators {
  public static int size(Iterator<?> iterator) {
    int count = 0, max = Integer.MAX_VALUE;
    while (iterator.hasNext()) {
      iterator.next();
      if (++count == max) {
        while (iterator.hasNext())
          iterator.next(); // exhaust iterator
        return count; // circumvent iterator.hasNext() in main loop
      }
    }
    return count;
  }
}

Another solution would be to have distinct methods size() and count(). One
of the two would either ignore the overflow (like size() does today) or
throw some kind of exception. The other one would simply stop iteration and
return at Integer.MAX_VALUE, informing the client of the potential
overflow. The client may then decide to go on counting by passing in the
same iterator instance again.

public class Iterators {
  /** counts elements; exhausts the iterator */
  public static int size(Iterator<?> it) {
    int count = count(it);

    // handle overflow, e.g.
    // while (it.hasNext()) count += count(it); // today's behavior
    // while (it.hasNext()) count(it); // size() contract
    // if (it.hasNext()) throw new RuntimeException("overflow");

    return count;
  }

  /** counts elements; proceeds at most Integer.MAX_VALUE steps */
  public static int count(Iterator<?> it) {
    int count = 0;
    for (int max = Integer.MAX_VALUE; it.hasNext(); it.next())
      if (++count == max)
        break;
    return count;
  }
}

This is only a minor bug since it requires a rare use-case.

Original issue reported on code.google.com by Robin...@gmail.com on 6 Oct 2009 at 2:37

GoogleCodeExporter commented 8 years ago
It's been my position for a while that "very large collections" like these are 
accidents waiting to happen -- for example Collections.frequency() and other 
JDK 
methods have this same problem. Users with such collections have to be 
extremely 
careful how they use them, and I don't want to make it a goal of our library to 
have 
to cater to those uses.

Thanks for the report.

Original comment by kevin...@gmail.com on 14 Oct 2009 at 4:19