goldmansachs / gs-collections

GS Collections has been migrated to the Eclipse Foundation, re-branded as Eclipse Collections. https://www.eclipse.org/collections/
https://www.eclipse.org/collections/
1.81k stars 276 forks source link

Bug in UnifiedSet$ChainedBucket.removeLongChain() method #30

Closed vincentvialard closed 8 years ago

vincentvialard commented 8 years ago

Hi there,

a problem with the computation of our hashkeys generated very frequent collisions, and thus bigger buckets. We then had an exception using UnifiedSet.retainAll().

I had a look at the code and found the problem: if i>3 then ChainedBucket.remove(i) calls removeLongChain(i-3), but the switch statement in removeLongChain throws an exception if i-3 > 3. Removing an object in the third chained bucket is thus not possible. It surely did not occur until now, because such heavy collisions are unlikely to happen.

Here is a "minimal change" fix:

        private void removeLongChain(ChainedBucket oldBucket, int i)
        {
            do
            {
                ChainedBucket bucket = (ChainedBucket) oldBucket.three;
                switch (i)
                {
                    case 0:
                        bucket.zero = bucket.removeLast(0);
                        return;
                    case 1:
                        bucket.one = bucket.removeLast(1);
                        return;
                    case 2:
                        bucket.two = bucket.removeLast(2);
                        return;
                    case 3:
                        if (bucket.three instanceof ChainedBucket)
                        {
                            i -= 3;
                            oldBucket = bucket;
                            continue;
                        }
                        bucket.three = null;
                        return;
                    default:
// new code starts here (similar to case 3)
                        if (bucket.three instanceof ChainedBucket)
                        {
                            i -= 3;
                            oldBucket = bucket;
                            continue;
                        }
// new code ends here
                        throw new AssertionError();
                }
            }
            while (true);
        }

Here is a small exemple that reproduces the problem (using some of the strings wth colliding hash from issue #26):

package de.derivo;

import com.gs.collections.impl.set.mutable.UnifiedSet;

import java.util.Set;

public class Main {

    static final String[] COLLISIONS = new String[] {
            "\u9103\ufffe",
            "\u9104\uffdf",
            "\u9105\uffc0",
            "\u9106\uffa1",
            "\u9107\uff82",
            "\u9108\uff63",
            "\u9109\uff44",
            "\u910a\uff25",
            "\u910b\uff06",
            "\u910c\ufee7"
    };

    final static int COUNT = 9;

    public static void main(String[] args) {
        Set<String> stringSet = new UnifiedSet<>();
        Set<String> stringSet2 = new UnifiedSet<>();

        for (int i = 0; i < COUNT; i++) {
            stringSet.add(COLLISIONS[i]);
        }
        for (int i = 0; i < COUNT-2; i++) {
            stringSet2.add(COLLISIONS[i]);
        }
        stringSet.retainAll(stringSet2);
    }
}

Thanks for the great collections, I hope this helps!

Vincent.