ari-ban / issue-test

0 stars 0 forks source link

CloseListener deletion is linear on the amount of created listeners. #1962

Closed arinban closed 7 years ago

arinban commented 7 years ago

On org.glassfish.grizzly.nio.NIOConnection, CloseListener objects are stored in a ConcurrentLinkedQueue, but it seems that it is not used as a Queue and objects are removed unorderly (without taking in consideration the especial usage in notifyCloseListeners) using the remove(Object) method.

What does the remove method do?

    /**
     * Removes a single instance of the specified element from this queue,
     * if it is present.  More formally, removes an element {@code e} such
     * that {@code o.equals(e)}, if this queue contains one or more such
     * elements.
     * Returns {@code true} if this queue contained the specified element
     * (or equivalently, if this queue changed as a result of the call).
     *
     * @param o element to be removed from this queue, if present
     * @return {@code true} if this queue changed as a result of the call
     */
    public boolean remove(Object o) {
        if (o == null) return false;
        Node<E> pred = null;
        for (Node<E> p = first(); p != null; p = succ(p)) {
            E item = p.item;
            if (item != null &&
                o.equals(item) &&
                p.casItem(item, null)) {
                Node<E> next = succ(p);
                if (pred != null && next != null)
                    pred.casNext(p, next);
                return true;
            }
            pred = p;
        }
        return false;
    }

As you can see it linearly seeks for the object to be removed from the queue, but the Node is not deleted, causing future deletions to iterate over and over old Nodes, this Nodes appear to be removed when a FullGC is run but in the meantime causes the deletion of the entry to cost more and more. In our test scenarios it takes ~30 minutes to run the FullGC and at least 18M Nodes are still there as you can see in the following image: screen shot 2017-08-04 at 5 34 58 pm

This behaviour impacts our applications behaviour reducing performance over time with very high impact, here is our tps variation on a 1 hour test: pasted image at 2017_07_13 12_02 pm And its correlation to GC's: pasted image at 2017_07_13 12_07 pm

I've replace this collection with a Set as its not needed to use the queue for any particular reason and the tps does not degrade anymore but it stays linear over time.

This issue was found using Grizzly 2.3.29, I'll send a PR for 2.4.x also as code will be cleaner in java8.

arinban commented 6 years ago
arinban commented 7 years ago

@LucianoGandini Commented Thanks for taking care of the merge @rlubke, can we get a 2.3.x release soon? We need to test it and validate it for next release.

arinban commented 7 years ago

@rlubke Commented @LucianoGandini https://javaee.groups.io/g/grizzly/message/22