Gamua / Starling-Framework

The Cross Platform Game Engine
http://www.starling-framework.org
Other
2.88k stars 825 forks source link

removeEventListener performance boost #491

Closed IonSwitz closed 10 years ago

IonSwitz commented 10 years ago

As I was going through code in my beloved Starling Graphics Extension, I stumbled upon something funny in one of my benchmarks: removeEventListener eats up a LOT of CPU.

While I am sure this issue for the Starling Graphics Extension can be solved in different ways, here is the problem I found when the number of event listeners grow on one object:

Each Graphic base class registers itself as a listener on Event.CONTEXT3D_CREATE, on Starling.current, in order to be able to recreate its geometry on a lost context situation. When I create and dispose a lot of Graphics classes ( several 100) and start investigating the performance problem, I end up in removeEventListener.

So I put together a test case, that adds and removes 2000 sprites as event listeners to Starling.current, listening to this event:

package
{
    import flash.display.Sprite;
    import flash.display.StageAlign;
    import flash.display.StageScaleMode;

    import starling.core.Starling;

    [SWF( width="1200", height="800", backgroundColor="#000000", frameRate="60" )]
    public class Main extends flash.display.Sprite
    {
        private var starling    :Starling;

        public function Main()
        {
            stage.scaleMode = StageScaleMode.NO_SCALE;
            stage.align = StageAlign.TOP_LEFT;
            starling = new Starling( ManySprites, stage, null, null, "auto", "baselineConstrained" );
            starling.antiAliasing = 0;
            starling.showStats = true;
            starling.start();
        }
    }
}
import flash.display.BitmapData;
import flash.display.Shape;
import starling.events.Event;
import flash.geom.Point;
import flash.geom.Rectangle;

import starling.display.DisplayObject;
import starling.core.Starling;
import starling.display.Sprite;

class ManySprites extends starling.display.Sprite
{
    private var sprites:Vector.<ManySprites>;
    function ManySprites(addListener:Boolean = true)
    {
        if ( addListener )
            addEventListener(Event.ADDED_TO_STAGE, onAdded);
    }

    private function onAdded ( e:Event ):void
    {
        sprites = new Vector.<ManySprites>(2000, true);
        for ( var i:int = 0; i < 2000; i++ )
        {
            var s:ManySprites = new ManySprites(false);
            sprites[i] = s;
            addChild(s);
            Starling.current.addEventListener(Event.CONTEXT3D_CREATE, s.onContextCreated);
        }
        addEventListener(Event.ENTER_FRAME, enterFrameHandler);
    }

    private function onContextCreated(event:starling.events.Event) : void
    {
    }

    private function enterFrameHandler( event:Event ):void
    {
        var i: int;
        for ( i = 0 ; i < 2000; i++)
        {
            var s:ManySprites = sprites[i];
            Starling.current.removeEventListener(Event.CONTEXT3D_CREATE, s.onContextCreated);
        }
        for ( i = 0 ; i < 2000; i++)
        {
            var s:ManySprites = sprites[i];
            Starling.current.addEventListener(Event.CONTEXT3D_CREATE, s.onContextCreated);
        }

    }
}

And performance chokes. The problem is the loop in removeEventListener, that pushes listeners from "listeners" to the "remainingListeners" vector. This eats up a lot of CPU.

Is there a reason that the "splice" method isn't used in removeEventListener to remove the listener from the vector? It is ALOT faster like this, and I think the functionality is identical, unless I misunderstand something about the splice method.

public function removeEventListener(type:String, listener:Function):void
{
    if (mEventListeners)
    {
        var listeners:Vector.<Function> = mEventListeners[type] as Vector.<Function>;
        if (listeners)
        {
            var numListeners:int = listeners.length;
            var remainingListeners:Vector.<Function>;

            var index:int = listeners.indexOf(listener);
            if ( index != -1)
                remainingListeners = listeners.splice(index, 1);
            else 
                remainingListeners = new <Function>[];

            mEventListeners[type] = remainingListeners;
        }
    }
}

Performance on my PC jumps from 5 FPS to 30 FPS in my extreme example above by using the "splice" method rather than the loop and "push" method.

EDIT: And I apologize for the horrible formatting of the code, I can't seem to get the forum to accept the indentation properly [PrimaryFeather] I fixed that for you!

IonSwitz commented 10 years ago

In fact, I realized that addEventListener suffers from a similar problem, and I have a new version of the EventDispatcher class that is a lot faster both when adding and removing eventListeners, but it will require multiple eyes in order to not introduce any bugs.

I don't have a pull request for it, but I can paste in the entire thing here, I hope, without anyone getting angry. The addition is to, instead of a Dictionary of listener Function vectors, add a Dictionary of EventDispatchHelper:s, a class that holds both a Function Vector and a Function Dictionary, allowing for fast look ups of duplicate "listeners". This makes the above sample fly at 60 FPS and 50% CPU load.

IonSwitz commented 10 years ago

// =================================================================================================
//
//  Starling Framework
//  Copyright 2011 Gamua OG. All Rights Reserved.
//
//  This program is free software. You can redistribute and/or modify it
//  in accordance with the terms of the accompanying license agreement.
//
// =================================================================================================

package starling.events
{
    import flash.utils.Dictionary;

    import starling.core.starling_internal;
    import starling.display.DisplayObject;

    use namespace starling_internal;

    /** The EventDispatcher class is the base class for all classes that dispatch events. 
     *  This is the Starling version of the Flash class with the same name. 
     *  
     *  <p>The event mechanism is a key feature of Starling's architecture. Objects can communicate 
     *  with each other through events. Compared the the Flash event system, Starling's event system
     *  was simplified. The main difference is that Starling events have no "Capture" phase.
     *  They are simply dispatched at the target and may optionally bubble up. They cannot move 
     *  in the opposite direction.</p>  
     *  
     *  <p>As in the conventional Flash classes, display objects inherit from EventDispatcher 
     *  and can thus dispatch events. Beware, though, that the Starling event classes are 
     *  <em>not compatible with Flash events:</em> Starling display objects dispatch 
     *  Starling events, which will bubble along Starling display objects - but they cannot 
     *  dispatch Flash events or bubble along Flash display objects.</p>
     *  
     *  @see Event
     *  @see starling.display.DisplayObject DisplayObject
     */
    public class EventDispatcher
    {
        private var mEventHelpers:Dictionary;

        /** Helper object. */
        private static var sBubbleChains:Array = [];

        /** Creates an EventDispatcher. */
        public function EventDispatcher()
        {  }

        /** Registers an event listener at a certain object. */
        public function addEventListener(type:String, listener:Function):void
        {
            if (mEventHelpers == null)
                mEventHelpers = new Dictionary();

            var helper:EventDispatcherHelper = mEventHelpers[type] as EventDispatcherHelper;
            if (helper == null)
                mEventHelpers[type] = new EventDispatcherHelper(listener);
            else if (helper.hasListener(listener) == false) // check for duplicates
                helper.addListener(listener);
        }

        /** Removes an event listener from the object. */

        public function removeEventListener(type:String, listener:Function):void
        {
            if (mEventHelpers)
            {
                var helper:EventDispatcherHelper = mEventHelpers[type] as EventDispatcherHelper;
                if (helper && helper.hasListener(listener) )
                {
                    helper.removeListener(listener);
                }
            }
        }

        /** Removes all event listeners with a certain type, or all of them if type is null. 
         *  Be careful when removing all event listeners: you never know who else was listening. */
        public function removeEventListeners(type:String=null):void
        {
            if (type && mEventHelpers)
            {
                var helper:EventDispatcherHelper = mEventHelpers[type];
                if ( helper != null )
                    helper.dispose();

                delete mEventHelpers[type];
            }
            else
                mEventHelpers = null;
        }

        /** Dispatches an event to all objects that have registered listeners for its type. 
         *  If an event with enabled 'bubble' property is dispatched to a display object, it will 
         *  travel up along the line of parents, until it either hits the root object or someone
         *  stops its propagation manually. */
        public function dispatchEvent(event:Event):void
        {
            var bubbles:Boolean = event.bubbles;

            if (!bubbles && (mEventHelpers == null || !(event.type in mEventHelpers)))
                return; // no need to do anything

            // we save the current target and restore it later;
            // this allows users to re-dispatch events without creating a clone.

            var previousTarget:EventDispatcher = event.target;
            event.setTarget(this);

            if (bubbles && this is DisplayObject) bubbleEvent(event);
            else                                  invokeEvent(event);

            if (previousTarget) event.setTarget(previousTarget);
        }

        /** @private
         *  Invokes an event on the current object. This method does not do any bubbling, nor
         *  does it back-up and restore the previous target on the event. The 'dispatchEvent' 
         *  method uses this method internally. */
        internal function invokeEvent(event:Event):Boolean
        {
            var helper:EventDispatcherHelper = mEventHelpers ?
                mEventHelpers[event.type] as EventDispatcherHelper : null;
            var numListeners:int = helper == null ? 0 : helper.numListeners;

            if (numListeners)
            {
                event.setCurrentTarget(this);

                // we can enumerate directly over the vector, because:
                // when somebody modifies the list while we're looping, "addEventListener" is not
                // problematic, and "removeEventListener" will create a new Vector, anyway.

                for (var i:int=0; i<numListeners; ++i)
                {
                    var listener:Function = helper.mListenerVector[i] as Function;
                    var numArgs:int = listener.length;

                    if (numArgs == 0) listener();
                    else if (numArgs == 1) listener(event);
                    else listener(event, event.data);

                    if (event.stopsImmediatePropagation)
                        return true;
                }

                return event.stopsPropagation;
            }
            else
            {
                return false;
            }
        }

        /** @private */
        internal function bubbleEvent(event:Event):void
        {
            // we determine the bubble chain before starting to invoke the listeners.
            // that way, changes done by the listeners won't affect the bubble chain.

            var chain:Vector.<EventDispatcher>;
            var element:DisplayObject = this as DisplayObject;
            var length:int = 1;

            if (sBubbleChains.length > 0) { chain = sBubbleChains.pop(); chain[0] = element; }
            else chain = new <EventDispatcher>[element];

            while ((element = element.parent) != null)
                chain[int(length++)] = element;

            for (var i:int=0; i<length; ++i)
            {
                var stopPropagation:Boolean = chain[i].invokeEvent(event);
                if (stopPropagation) break;
            }

            chain.length = 0;
            sBubbleChains.push(chain);
        }

        /** Dispatches an event with the given parameters to all objects that have registered 
         *  listeners for the given type. The method uses an internal pool of event objects to 
         *  avoid allocations. */
        public function dispatchEventWith(type:String, bubbles:Boolean=false, data:Object=null):void
        {
            if (bubbles || hasEventListener(type)) 
            {
                var event:Event = Event.fromPool(type, bubbles, data);
                dispatchEvent(event);
                Event.toPool(event);
            }
        }

        /** Returns if there are listeners registered for a certain event type. */
        public function hasEventListener(type:String):Boolean
        {
            var helper:EventDispatcherHelper = mEventHelpers ?
                mEventHelpers[type] as EventDispatcherHelper : null;
            return helper ? helper.numListeners != 0 : false;
        }
    }
}
import flash.utils.Dictionary;

class EventDispatcherHelper
{
    public var mListenerVector:Vector.<Function>;
    public var mListenerDictionary:Dictionary;

    public function EventDispatcherHelper(listener:Function) 
    {
        mListenerVector = new <Function>[listener];
        mListenerDictionary = new Dictionary();
        mListenerDictionary[listener] = listener; 
    }

    internal function hasListener(listener:Function) : Boolean
    {
        var value:Function = mListenerDictionary[listener] as Function;
        return value != null;
    }

    internal function addListener(listener:Function) : void
    {
        mListenerVector.push(listener);
        mListenerDictionary[listener] = listener;
    }

    internal function removeListener(listener:Function) : void
    {
        var remainingListeners:Vector.<Function>;
        var index:int = mListenerVector.indexOf(listener);
        if ( index != -1)
            remainingListeners = mListenerVector.splice(index, 1);
        else 
            remainingListeners = new <Function>[];

        mListenerVector = remainingListeners;
        delete mListenerDictionary[listener]; // Not sure if delete or null is the correct way here
        //mListenerDictionary[listener] = null;
    }

    internal function dispose() : void
    {
        mListenerVector = null;
        mListenerDictionary = null;
    }

    internal function get numListeners() : int 
    {
        return mListenerVector == null ? 0 : mListenerVector.length;
    }
}
PrimaryFeather commented 10 years ago

The reason why I'm working on a copy, not the original vector, can be found in the comment inside "invokeEvent":

https://github.com/PrimaryFeather/Starling-Framework/blob/master/starling/src/starling/events/EventDispatcher.as#L131

In "invokeEvent", user code is executed, which means that "add/removeEventListener" could be called. If those calls could modify the vector, we'd run into problems (e.g. skipping one of the listeners or running into a null reference).

So we can either work in a copy inside "removeEventListener" or in "invokeEvent". I decided that "invokeEvent" is the method were performance is more important, so I made the duplicate inside "removeEventListener".

However, you're perfectly right: the loop I had inside "removeEventListener" is quite slow and clumsy. I changed that and use "splice", too, now (on a copy, though). That should still be much faster!

That's how far I would go, though. As with our recent "normalize angle" discussion, I think the optimization of "addEventListener" is not critical enough to warrant the additional complexity.

IonSwitz commented 10 years ago

Sounds good! :) And thank you for that explanation. You are absolutely correct about the copy, ofcourse.

About the addEventListener thing, it will probably not be an issue in most normal cases.

It just pains my eyes when I see O(n) code in situations where O(1) or O(log n) works better, and I sometimes have a hard time prioritizing code clarity over code performance. This is probably one of the cases where code clarity should be king. :)

Thanks!

joshtynjala commented 10 years ago

I changed that and use "splice", too, now (on a copy, though). That should still be much faster!

Don't forget that calling splice() always allocates two objects, so it may have garbage collector implications. That may affect performance in a different way.

IonSwitz commented 10 years ago

Calling "push" many times will create a lot more objects internally in the vector. The GC difference between the push version and the splice version is quite radical, and splice definitely comes up on top (as in "less allocations and GC work")

katopz commented 10 years ago

push is slow btw http://jacksondunstan.com/articles/1168 i think all Vector or Array must fixed-length while use and unfixed then extend it while need.

joshtynjala commented 10 years ago

Ah yes, I missed that push() was used before. Recently, I went through some areas of the Feathers code and replaced push() in many performance-critical places with array[array.length] = newValue (or, if in a loop, a local variable that I incremented each time instead of calling array.length repeatedly). It might actually be worth going back to using a loop and doing that if there's still some garbage collection pressure from splice().

IonSwitz commented 10 years ago

Yeah, I went through the Graphics Extension code for vertex allocations and changed that to fixed-length vectors where possible. It does wonders for performance.

However, in the case above, the splice operation outperforms any other version, as the vector that holds listeners must be of a dynamic length. Converting between fixed-length and dynamic length vectors would chew up a lot of performance.

If you feel up to it, you can try to beat the performance gained by my example above, both the test case and the new code for EventDispatcher.

Even if Daniel don't want to add it to the code base, it can be a fun exercise in vector experimentation for anyone eager to learn more :)

joshtynjala commented 10 years ago

I didn't mean using a fixed length vector. Switching back and forth between fixed length and dynamic length vectors would indeed hurt performance. I meant simply replacing the calls to push() with [] syntax instead. It avoids allocating any objects.

IonSwitz commented 10 years ago

Sorry if I was unclear, the reference to fixed length vectors was for @katopz

I will check if I can see a difference between the push() and the [] syntax in this case. But it seems odd, as regardless of if we use push or the [] syntax, objects within the vector have to be allocated in a vector in order to contain new entries in the vector.

But I will definitely check it out. Thanks! :)

It would be interesting to be able to read the internal workings of the basic Flash classes. So much is based on hearsay, basically. ( Not saying that you are, not at all, but reading through comments on Flash forums, you tend to read things like "I have heard that X is faster than Y", etc.)

joshtynjala commented 10 years ago

push() allocates an Array for the rest arguments. It can't be kept around or reused for any purpose, so it always ends up being garbage collected. It's right there in the public API, so definitely not hearsay.

joshtynjala commented 10 years ago

Additionally, splice() not only has rest arguments, but it also creates a new array/vector for the return value. Calling splice() once is indeed better than calling push() many times, but it may be valuable to avoid those two allocations as well.

joshtynjala commented 10 years ago

Oh yeah. Also worth noting that Tamarin is open source. At least some of those internal workings are public. At least the core ECMAScript stuff.

IonSwitz commented 10 years ago

Ok, yes, I forgot the argument array crap in push(). I will see what difference the [] operation would do when I get home.

IonSwitz commented 10 years ago

Well, I gotta hand it to you, @joshtynjala. A change from push to [] in my example above raised performance to the same level as the new "slice and splice" method from Daniel. A jump from 5 FPS to 13 FPS in both scenarios.

However, the huge improvement comes when looking at memory allocations. the "slice and splice" method generates 6000 memory allocations for 2000 objects, whereas the [] method below only generates 2000 memory allocations.

New code from Daniel:

listeners = listeners.slice(); listeners.splice(index, 1);

New code with [] operator:

                    var newLength:int = listeners.length - 1;
                    var oldLength:int = listeners.length;
                    var newVector:Vector.<Function> = new Vector.<Function>(newLength);
                    var counter:int = 0;
                    for ( var i : int = 0; i < oldLength; i++ )
                    {
                        if ( i != index )
                            newVector[counter++] = listeners[i];
                    }

Thank you, Josh, for opening my eyes to the difference between push and [].

katopz commented 10 years ago

you may need

newVector[int(counter++)]
IonSwitz commented 10 years ago

Really? That seems very odd to me. Surely, postfix incrementation of an int needs to ensure that the int remains an int? Otherwise this language is odder than any other I've ever seen ;)

Then again, ActionScript has thrown me for a loop every now and then. :)

b005t3r commented 10 years ago

@IonSwitz, so using the [] syntax and adding 2000 objects causes Vector to make 2000 allocations? That's madness. I always thought Vector allocates a bit more space than it currently needs, so it doesn't have to allocate memory on every add (or free on every remove). ArrayList in Java works this way - it allocates (let's say) 20% more space than it currently needs and once the space is all used up, it expands. Do you feel like checking this method? :)

IonSwitz commented 10 years ago

@b005t3r I am investigating now. I am a bit confused. When I do a regular "put many Sprites in a vector", I don't see this memory adjustment, but when I run the event handler code, I get that effect.

So I'm not sure if it is Scout that is tricking me, or if something else is going on. Sorry if I gave bad information, I really don't want to contribute to the ActionScript "rumor mill" :)

Investigation is ongoing.

b005t3r commented 10 years ago

@IonSwitz, and if you feel like you really want to go for it, you can also try using a linked list, which looks like a really good solution here (most of the times, you iterate over all elements + adding/removing an element is super fast). I have it implemented here: https://github.com/b005t3r/MedKit

PrimaryFeather commented 10 years ago

Damn, @joshtynjala, you're perfectly right. Stupid of me — I totally forgot the side-effects of "splice". I reverted the code so that it uses the loop again, but avoids "push". As you guys all pointed out, that's the best way to do it. Thanks for pushing me! (Pun intended g)

@IonSwitz, that "int" cast mentioned by @katopz really helps! http://wiki.starling-framework.org/manual/performance_optimization#syntax_tips

That's another oddity of AS3. ;-)

IonSwitz commented 10 years ago

@b005t3r The problem with a linked list is that you have to allocate an extra object, in this case the LinkedListEntry object to wrap each element in it. So you always get allocations of extra objects there.

Thanks for that link, Daniel. Very obscure. How is that even possible...

I mean, I can understand it if the 'a' in your 'a *10' example is a Number, but incrementing an int should result in another int, and it should be identical with or without the cast.

IonSwitz commented 10 years ago

Ok, I haven't been able to get anything conclusive out of Scout in the test harness. I do get the 2000 allocations in the test case above there, but I haven't been able to determine where they come from fully, so maybe they are from some other part of the method. My isolated test case is completely inconclusive, the way I have written it ( a for-loop, add a new String to the vector, shows no allocations at all, so I have to wonder if it gets optimized away, or if Scout is just teasing me)

Anyway, thanks for looking into this thread. Sorry if I let my optimization fever run too rampant.

b005t3r commented 10 years ago

Do you add the same string 2000 times or do you create a new one each loop iteration?

IonSwitz commented 10 years ago

I did a member Vector.(String) in the class, that got set to length = 0 on an onEnterFrame event listener. After that, every frame, I had a 2000 item loop that pushed a new String("foo") into the vector. (I also tried doing "new Vector..." instead of the length = 0 operation )

In Java, C++ and C#, that would result in a string allocation and an allocation for the push() operation, (more or less, the thing you mention about increasing a vector with 20% or 50% is common practice too, ofcourse) but instead I got almost no allocations at all.

So, what I am guessing is either there is some special Generation 0 garbage collects going on that doesn't show up in Scout, or there is some other mistake I've made when thinking this stuff through.

Either is possible. :) The vector did grow with length 2000 after the onEnterFrame was done, but the GC didn't blink in Scout. So something is weird.

b005t3r commented 10 years ago

Not sure about all of this. I think you should create the String once and assign it 2000 times in a loop to make sure you're not looking at String allocations. I'll try testing it tomorrow.

PrimaryFeather commented 10 years ago

As far as I remember, those "rest" arguments get optimized away by the JIT-compiler. On iOS, however (which uses AOT compilation), they should turn up in Scout.

IonSwitz commented 10 years ago

@b005t3r Yes, I was hoping to see 4000 allocations, 2000 Strings and 2000 push objects for the vector, but I see 0 within the onEnterFrame method, which is exactly why I am so surprised and sure that something is tricking me. :)

Of course, it is possible that the JIT compiler can see that the vector is never actually used and that the strings are never accessed within the program, but that seems like a pretty deep JIT voodoo to me.