eclipse / eclipse-collections

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.
https://eclipse.dev/collections/
2.42k stars 604 forks source link

SortedSet#intersect does not always return elements from this #1666

Open duponter opened 1 month ago

duponter commented 1 month ago

Context

According to the JavaDocs SortedSet#intersect should always return members from this:

https://eclipse.dev/collections/javadoc/11.1.0/org/eclipse/collections/api/set/sorted/ImmutableSortedSet.html#intersect(org.eclipse.collections.api.set.SetIterable)

Description copied from interface: SortedSetIterable Returns the set of all objects that are members of both this and set. The intersection of [1, 2, 3] and [2, 3, 4] is the set [2, 3]. The output will contain instances from this, not set.

Issue

However, testing reveals that elements of the smallest of the 2 sets are returned instead. Not only contradicts this the API documentation, but this is counterintuitive as well. In addition, when replacing intersect with the select(Set::contains) combination, the same functionality works as expected.

Reproduction

Following test suite contains 4 test cases:

Technologies used:

class SortedSetIntersectTest {

    @Nested
    class SelectContains {
        @Test
        void selectContainsReturnsElementsFromTheInvokingSetWhenFirstSetIsLargerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15));
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(selectContains(first, second)).containsExactlyElementsOf(first.drop(10));
            assertThat(selectContains(second, first)).containsExactlyElementsOf(second.take(5));
        }

        @Test
        void selectContainsReturnsElementsFromTheInvokingSetWhenFirstSetIsSmallerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15)).drop(10);
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(selectContains(first, second)).containsExactlyElementsOf(first);
            assertThat(selectContains(second, first)).containsExactlyElementsOf(second.take(5));
        }

        private SortedSet<Person> selectContains(ImmutableSortedSet<Person> first, ImmutableSortedSet<Person> second) {
            return SortedSets.immutable.withAll(Person.NATURAL_COMPARATOR, first.select(second::contains)).castToSortedSet();
        }
    }

    @Nested
    class Intersect {

        @Test
        void intersectReturnsElementsFromTheSmallestSetWhenFirstSetIsLargerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15));
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(intersect(first, second)).containsExactlyElementsOf(second.take(5));   // succeeds, but should return elements from 'first'
            assertThat(intersect(second, first)).containsExactlyElementsOf(second.take(5));
//        assertThat(intersect(first, second)).containsExactlyElementsOf(first.drop(10));   // fails, but should work imho
        }

        @Test
        void intersectReturnsElementsFromTheSmallestSetWhenFirstSetIsSmallerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15)).drop(10);
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(intersect(first, second)).containsExactlyElementsOf(first);
            assertThat(intersect(second, first)).containsExactlyElementsOf(first);            // succeeds, but should return elements from 'second'
//        assertThat(intersect(second, first)).containsExactlyElementsOf(second.take(5));   // fails, but should work imho
        }

        private SortedSet<Person> intersect(ImmutableSortedSet<Person> first, ImmutableSortedSet<Person> second) {
            return SortedSets.immutable.withAll(Person.NATURAL_COMPARATOR, first.intersect(second)).castToSortedSet();
        }
    }

    private record Person(String name, int age) {
        public static final Comparator<Person> NATURAL_COMPARATOR = Comparator.comparing(Person::name).thenComparingInt(Person::age);
        public static final Comparator<Person> NAME_ONLY_COMPARATOR = Comparator.comparing(Person::name);

        private static final ThreadLocalRandom RANDOM = ThreadLocalRandom.current();
        private static final String[] NAMES = new String[]{
            "Hye", "Robbie", "Daryl", "Ulrike", "Marivel", "Huey", "Lindsay", "Avery", "Brock", "Patti", "Makeda", "Rudolf", "Burma", "Slyvia", "Mike", "Oscar", "Cher", "Willard", "Buford", "Hobert", "Brain", "Ricky", "Jeraldine", "Lindsey", "Mia", "Carolee", "Barney", "Emmett", "Lezlie", "Cristi"
        };

        public static ImmutableList<Person> generate(int size) {
            MutableSortedSet<Person> persons = SortedSets.mutable.of(NAME_ONLY_COMPARATOR);
            while (persons.size() < size) {
                persons.with(new Person());
            }
            return persons.toImmutableList();
        }

        private static int randomAge() {
            return RANDOM.nextInt(100);
        }

        private static String randomName() {
            return NAMES[RANDOM.nextInt(NAMES.length)];
        }

        public Person() {
            this(randomName(), randomAge());
        }

        Person changeAge() {
            int newAge = IntStream.generate(Person::randomAge).filter(age -> age != this.age).findFirst().orElseThrow();
            return new Person(name, newAge);
        }
    }
}
motlin commented 3 weeks ago

It makes sense to me. I'd like to get @mohrezaei's opinion too since there was a similar change in #1661 that raised performance concerns.

mohrezaei commented 3 weeks ago

So clearly the behavior and the doc are inconsistent. It should be noted that this can only happen if the set's sense of equality is different from whatever way the returned set's elements are being compared to elements from this.

So there are two ways to fix this: 1) Keep the doc, change the behavior. This would get rid of the optimization. 2) Keep the behavior, fix the doc.

I would generally favor (2), for the following reasons:

We could even be very explicit in the doc and suggest using this.select(that.contains) if someone has the requirement to preserve the memory references from this.