HaxeFlixel / flixel

Free, cross-platform 2D game engine powered by Haxe and OpenFL
https://haxeflixel.com/
MIT License
1.99k stars 439 forks source link

Performance of OpenFL vs. Flixel+OpenFL #2978

Open giuppe opened 11 months ago

giuppe commented 11 months ago

I ran both OpenFL's BunnyMark and Flixel's FlxBunnyMark on the same machine, adding bunnies until the fps dropped under 60. Windows-HXCPP-release. For FlxBunnymark, the options were: No Collisions/No shaders/Step: Variable/On-Screen. I also disabled the angularVelocity because the bunnies in the OpenFL version are not rotating.

FlxBunnyMark: 27k bunnies

OpenFL BunnyMark: 240k bunnies

27k bunnies for Flixel+OpenFL (at 52fps) vs. 240k bunnies for OpenFL (at 54fps).

Please forgive me if there is something obvious that I'm missing: both benchmarks are "official", I'm using the default configuration for each project, so I'm guessing they are doing the best they can to show the correct numbers. OpenFL's BunnyMark is using drawQuads, too.

Using the Visual Studio profiler there seems to be no discernible bottlenecks (as in: there are no obvious inefficiencies in Flixel that are eating the cpu time), just, the end result is that it's taking a lot more time to render.

Is Flixel unwittingly creating more work than it should for the rendering pipeline? Or is OpenFL missing some optimizations that overwhelmingly affect Flixel?

I know that there is more than raw performance to appreciate, but still: it's the same rendering engine, the numbers shouldn't be so different.

How can we approach the issue to gather more data, maybe find the root cause? Does anyone have any pointers?

Geokureli commented 11 months ago

first thing I notice, the bunnies in the first image are rotated slightly, where they are all upright in the second, this might be preventing batch drawing. Also how are you rendering the openfl standalone test? Edit: Oh, i see. i forgot openfl had a bunnymark demo

Edit 2: The other difference i see is the UI overlay in Flixel, does hiding that improve performance? I doubt it will

giuppe commented 11 months ago

the bunnies in the first image are rotated slightly

yes, it's the initial angle, I forgot to disable it. It seems to make a small difference (+1000 bunnies):

immagine

Oh, i see. i forgot openfl had a bunnymark demo

Exactly, I just did openfl create BunnyMark and then openfl test windows.

Without the UI we gain another 1000 bunnies:

immagine

(I also used the openfl.display.FPS object to avoid using FlxText).

Another 1000 bunnies by disabling the background:

immagine

In any case, even with these changes, the fps counter goes under 60 at 25000 bunnies.

Geokureli commented 11 months ago

yeah the difference is still massive, and it's worth doing a deep dive into this, thanks for checking those loose ends though!

Is the openfl test using Bitmap instances, are they both ending up with some gl-batch draw? I think flixel is rendering sprites to a Graphics buffer, I've always wondered if that could be omitted

giuppe commented 11 months ago

OpenFL BunnyMark is doing this, removing unnecessary code:

public function new(){
    // ... other initializations ...

    var bitmapData = Assets.getBitmapData ("assets/wabbit_alpha.png");
    tileset = new Tileset (bitmapData);
    tileset.addRect (bitmapData.rect);

    // ...

    indices = new Vector<Int> ();
    transforms = new Vector<Float> ();
}

private function addBunny ():Void {
    var bunny = new Bunny ();
    bunny.x = 0;
    bunny.y = 0;
    bunny.speedX = Math.random () * 5;
    bunny.speedY = (Math.random () * 5) - 2.5;
    bunnies.push (bunny);

    indices.push (bunny.id);
    transforms.push (0);
    transforms.push (0);
}

private function stage_onEnterFrame (event:Event):Void {
    for (i in 0...bunnies.length) {
        // ... recalculates bunnies position ...
        transforms[i * 2] = bunny.x;
        transforms[i * 2 + 1] = bunny.y;
    }

    graphics.clear ();
    graphics.beginFill (0xFFFFFF);
    graphics.drawRect (0, 0, stage.stageWidth, stage.stageHeight);
    graphics.beginBitmapFill (tileset.bitmapData, null, false);
    graphics.drawQuads (tileset.rectData, indices, transforms);
}

So it's doing a single drawQuads call per frame, with all the bunnies. It can do this because all the bunnies have the same bitmapData. I was wondering if this is giving it those insane numbers, and so I put the drawQuads code in the bunnies loop (something more similar to having a different bitmapData for each sprite):

private function stage_onEnterFrame (event:Event):Void {
    graphics.clear ();
    graphics.beginFill (0xFFFFFF);
    for (i in 0...bunnies.length) {
        var transforms = new Vector<Float>();
        var indices = new Vector<Int>();

        // ... recalculates bunnies position ...

        transforms.push(bunny.x);
        transforms.push(bunny.y);
        indices.push(0);

        graphics.drawRect (bunny.x, bunny.y, tileset.rectData[2], tileset.rectData[3]);
        graphics.beginBitmapFill (tileset.bitmapData, null, false);
        graphics.drawQuads (tileset.rectData, indices, transforms);
    }

}

AFAIK this is more like how Flixel's FlxCamera.render() works: each item has its own drawQuad(). and the result is:

immagine

If I'm understanding this correctly (and barring errors in my implementation), this would mean that drawQuads is itself relatively slow and Flixel is already optimizing it a lot...

OR shaderFill is faster than bitmapFill, I'll try.

giuppe commented 11 months ago

I changed bitmapFill to shaderFill in OpenFL's BunnyMark and it almost doubles the number of bunnies:

immagine

But still far from FlxBunnyMark 25k bunnies

Geokureli commented 10 months ago

Seems related to https://github.com/HaxeFlixel/flixel/issues/3005