Methods on Sets class: union(), intersection(), difference()

GoogleCodeExporter commented 9 years ago

I doubt I'll be the only person saying that it sure would be nice to
finally have these methods.  Aren't they just about the most fundamental
things that one does with a Set?  It feels way overdue to have them supported.

I have these coded up already, but there are Issues.  Josh says, "run away.
run far away" which sounds pretty ominous.

There is one gigantic problem.  The caller will undoubtedly expect the
method to do "the right thing" and return a TreeSet if his inputs were
TreeSets, a LinkedHashSet if his inputs were LHS's, etc., and this is
next-to-impossible to accomplish in any reasonable way.

We'd love to be able to just clone the first Set given and then modify the
clone.  But clone is broken to a point of hopelessness.  The only way we
could possibly even hope to use clone would be via reflection.  That code
was written, but it feels very wrong.

So the alternate approach is to return a Set which is a *view* of the
union/intersection/difference of the two input Sets.  Then the caller can
either use that view directly, or copy the contents out into the desired
concrete collection type from there. This sidesteps a lot of issues, as the
collection code we write primarily just delegates to the input collections.
 Again, I wrote all the code to do this, and I'm just sitting on it.

Very Bad Things may result if you take a TreeSet whose comparator is
inconsistent with equals(), and then try to union() it with a plain old
HashSet. This may be part of where Josh's concern comes from.  However, my
goal is not to prevent people from shooting themselves in the foot using
our library.  My goal is to make sure that if they do shoot themselves in
the foot, well, they really had to have aimed first.  (And why are you
aiming at your foot, people??)

Original issue reported on code.google.com by kevin...@gmail.com on 23 Oct 2007 at 4:43

GoogleCodeExporter commented 9 years ago

Why not just force the caller to pass in the result collection?

Original comment by jahlborn@gmail.com on 25 Oct 2007 at 2:40

GoogleCodeExporter commented 9 years ago

Right, I forgot to bring up that possibility.

I'm concerned that the usage would be extremely confusing.

  Set<String> result = Sets.union(setA, setB, setC, setD);

Now quickly!  Which of those four parameters is the empty set to be written 
into, and
which are the three I want to union?

But perhaps this is a case where a "fluent" interface can come to the rescue?

  Set<String> result = Sets.union(setA, setB, setC).into(setD);

I had not thought about this possibility before just now!

Original comment by kevin...@gmail.com on 25 Oct 2007 at 3:30

GoogleCodeExporter commented 9 years ago

That said, the "view" option is still strongly appealing to me.

Rough, *untested* example of how this would work:

  /**
   * Returns an unmodifiable view of the union of two sets. The returned
   * set contains all elements that are contained by either backing set.
   *
   * <p>Results are undefined if {@code set1} and {@code set2} are sets based on
   * different equivalence relations (as {@link HashSet}, {@link TreeSet}, and
   * the {@link Map#keySet} of an {@link java.util.IdentityHashMap} all are).
   */
  public static <E> Set<E> union(
      final Set<E> set1, final Set<E> set2) {
    if (set1 instanceof SortedSet<?> && set2 instanceof SortedSet<?>) {
      SortedSet<E> sortedSet1 = (SortedSet<E>) set1;
      SortedSet<E> sortedSet2 = (SortedSet<E>) set2;

      if (Objects.equal(sortedSet1.comparator(), sortedSet2.comparator())) {
        return sortedUnion(sortedSet1, sortedSet2);
      }
    }
    // TODO: optimize by switching order?

    final Set<E> set2minus1 = difference(set2, set1);
    final Iterable<E> iterable = Iterables.concat(set1, set2minus1);

    return new AbstractSet<E>() {
      public int size() {
        return set1.size() + set2minus1.size();
      }
      @Override public boolean isEmpty() {
        return set1.isEmpty() && set2.isEmpty();
      }
      public Iterator<E> iterator() {
        return iterable.iterator();
      }
      @Override public boolean contains(Object object) {
        return set1.contains(object) || set2.contains(object);
      }
    };
  }

Original comment by kevin...@gmail.com on 25 Oct 2007 at 7:50

GoogleCodeExporter commented 9 years ago

One thing that has been missing from the start is:

containsNone(Collection<?> elements) 

This allows checking whether two collections are disjoint or not. 
Alternatively, one
could have the negation: containsSome(...)

Ideally these would be overridable methods, since you can clearly optimize the
results based on the collection. Two sorted sets with the same comparator can 
have
really fast O(n) tests, for example.

Original comment by mark.edw...@gmail.com on 10 Nov 2007 at 7:55

GoogleCodeExporter commented 9 years ago

Sorry, hit save too early:

But since they can't be methods on Collection (without Sun buying in), having 
them as
statics would be useful.

Original comment by mark.edw...@gmail.com on 10 Nov 2007 at 7:56

GoogleCodeExporter commented 9 years ago

See Collections.disjoint().

It doesn't do much optimization, but I will file a bug on that.  For example, 
if both
sets are EnumSets, think of how screamin' fast it could be.

Original comment by kevin...@gmail.com on 10 Nov 2007 at 8:13

GoogleCodeExporter commented 9 years ago

Back to the original question; one solution that might be good enough is a 
builder:

Set<String> names = ImmutableHashSet.builder()
    .add(myFriends)
    .add("Kevin", "Jared")
    .remove(peopleWhoOweMeMoney)
    .create();

we may need to distinguish names between add/remove and addAll/removeAll, but 
this is
the idea.

ImmutableHashSet needs something like this anyway.

Original comment by kevin...@gmail.com on 29 Nov 2007 at 1:24

GoogleCodeExporter commented 9 years ago


I like the builder.  Could you also allow removal to happen via Constraints, and
maybe provide some handy common-case Constraints (ie, to facilitate differences 
and
intersections)?

import static Constraints.*;

ImmutableHashSet.builder().add(myFriends).removeIf(isIn(owesMeMoney));
ImmutableHashSet.builder().add("Kevin", "Jared").removeIf(isNotIn(startsWithK));

Original comment by logan.jo...@gmail.com on 30 Nov 2007 at 7:31

GoogleCodeExporter commented 9 years ago

Almost forgot:  the obvious advantage to using constraints for removal is that 
you
don't need any additional methods to go from:

Set startsWithK = ...;
IHS.builder().add("Kevin", "Jared").removeIf(isNotIn(startsWithK))

to:

Constraint doesNotStartWithK = ...;
IHS.builder().add("Kevin", "Jared").removeIf(doesNotStartWithK);

In other words, the same methods are useful for constraining the set even if you
don't have another set to use for subtraction.

Original comment by logan.jo...@gmail.com on 30 Nov 2007 at 7:38

GoogleCodeExporter commented 9 years ago

The problem with the builder approach is that a set intersection is still as
inefficient as it is today -- first copy everything, then retain-all.  Another 
idea:

Set<Foo> s = Sets.intersection(set1, set2).into(new HashSet<Foo>());

or equivalently

Set<Foo> s = Sets.newHashSet();
Sets.intersection(set1, set2).into(s);

Original comment by kevin...@gmail.com on 19 Dec 2007 at 6:50

GoogleCodeExporter commented 9 years ago

Can the approaches be combined, to make the intersection efficient *and* keep 
the
flexibility of filtering on a constraint?

Set<Foo> s = someExistingSet;
s.copy(someConstraint).into(Sets.newHashSet());

Then your intersection method looks like this:

public Set<T> intersection(Set<T> set1, Set<T> set2, Set<T> targetSet) {
    Set<Foo> intersection = Sets.newHashSet();
    set1.copy(isIn(set2)).into(intersection);
    set2.copy(isIn(set1)).into(intersection);
}

Original comment by logan.jo...@gmail.com on 19 Dec 2007 at 7:23

GoogleCodeExporter commented 9 years ago

Attached the file for the ImmutableHashSetBuilder. The code is consistent with 
the
following approach

Set<String> names = ImmutableHashSetBuilder<String>()
    .add(myFriends)
    .add("Kevin", "Jared")
    .remove(peopleWhoOweMeMoney)
    .getSet();

Constraint doesNotStartWithK = ...;
ImmmutableHashSetBuilder<String>().add("Kevin", 
"Jared").removeIf(doesNotStartWithK);

Original comment by anitha.s...@gmail.com on 26 Dec 2007 at 10:13

Attachments:

ImmutableHashSetBuilder.tar.gz

GoogleCodeExporter commented 9 years ago

There is still a need for collection builders, but the methods described in this
entry are now added.

Original comment by kevin...@gmail.com on 30 May 2008 at 4:02

Changed state: Fixed

TimurMahammadov / google-collections

Methods on Sets class: union(), intersection(), difference() #16